last | first | birth | death | title | pub_year | publisher | pub_loc |
---|---|---|---|---|---|---|---|
Twain | Mark | 1835 | 1910 | Huckleberry Finn | 1986 | Penguin USA | New York |
Twain | Mark | 1835 | 1910 | Tom Sawyer | 1987 | Viking | New York |
Rothfuss | Patrick | 1973 | The Name of the Wind | 2007 | DAW Books | New York | |
Lorde | Audre | 1934 | 1992 | Sister Outsider | 1984 | Crossing Press | New York |
Mann | Thomas | 1875 | 1955 | Death in Venice |
1989 | Penguin | New York |
Cather | Willa | 1873 | 1947 | O Pioneers! | 1992 | Vintage | New York |
Cather | Willa | 1873 | 1947 | My Ántonia | 1999 | Penguin | New York |
last,first,birth,death,title,pub_year,publisher,pub_loc\nTwain,Mark,1835,1910,Huckleberry Finn,1986,Penguin,New York\nTwain,Mark,1835,1910,Tom Sawyer,1987,Viking,New York\nRothfuss,Patrick,1973,,The Name of the Wind,2007,DAW Books,New York\nLorde,Audre,1934,1992,Sister Outsider,1984,Crossing Press,New York\nMann,Thomas,1875,1955,Death in Venice,1989,Penguin,New York\nCather,Willa,1873,1947,O Pioneers!,1992,Vintage,New York\nCather,Willa,1873,1947,My Ántonia,1999,Penguin,New York
written by
published by
published
written by
published by
published
many
many
one
and
author_id | last | first | birth | death |
---|---|---|---|---|
1 | Twain | Mark | 1835 | 1910 |
2 | Rothfuss | Patrick | 1973 | NULL |
3 | Lorde | Audre | 1934 | 1992 |
4 | Mann | Thomas | 1975 | 1995 |
5 | Cather | Willa | 1873 | 1947 |
work_id | title | pub_year | author_id |
---|---|---|---|
1 | Huckleberry Finn | 1986 | 1 |
2 | Tom Sawyer | 1987 | 1 |
3 | The Name of the Wind | 2007 | 2 |
4 | Sister Outsider | 1984 | 3 |
5 | Death in Venice | 1989 | 4 |
5 | O Pioneers! | 1992 | 5 |
7 | My Ántonia | 1999 | 5 |
and
pub_id | name | loc |
---|---|---|
1 | Penguin | New York |
2 | Viking | New York |
3 | DAW Books | New York |
4 | Crossing Press | New York |
5 | Vintage | New York |
pub_id | name | loc_id |
---|---|---|
1 | Penguin | 1 |
2 | Viking | 1 |
3 | DAW Books | 1 |
4 | Crossing Press | 1 |
5 | Vintage | 1 |
loc_id | location |
---|---|
1 | New York |
pub_id | name | loc |
---|---|---|
1 | Penguin | New York |
2 | Viking | New York |
3 | DAW Books | New York |
4 | Crossing Press | New York |
5 | Vintage | New York |
pub_id | name | loc_id |
---|---|---|
1 | Penguin | 1 |
2 | Viking | 1 |
3 | DAW Books | 1 |
4 | Crossing Press | 1 |
5 | Vintage | 1 |
loc_id | location |
---|---|
1 | New York |
Now we could use our "locations" table again, for instance if we wanted to add information on birth city to the authors table!
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
Hadley Wickham, "Tidy Data":
https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
Julia Silge and David Robinson, Text Mining with R: A Tidy Approach:
https://www.tidytextmining.com/
## # A tibble: 725,055 x 4
## book linenumber chapter word
## <fct> <int> <int> <chr>
## 1 Sense & Sensibility 1 0 sense
## 2 Sense & Sensibility 1 0 and
## 3 Sense & Sensibility 1 0 sensibility
## 4 Sense & Sensibility 3 0 by
## 5 Sense & Sensibility 3 0 jane
## 6 Sense & Sensibility 3 0 austen
## 7 Sense & Sensibility 5 0 1811
## 8 Sense & Sensibility 10 1 chapter
## 9 Sense & Sensibility 10 1 1
## 10 Sense & Sensibility 13 1 the
## # ... with 725,045 more rows
what are your relationships?
how will you filter the data?
how will you sort the data?
how often will the data be updated?
who will do this updating?
how will you share this data?
through a website?
raw data?
both?
relational database example adapted from:
A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.
http://www.digitalhumanities.org/companion/
wordpress structure image from
http://web-profile.net/wp-content/uploads/wordpress_db_schema.pngxml image fromhttps://marketing.adobe.com/resources/help/en_US/insight/dataset/c_xml_dec_grps.html