OBJECTIVES
- Use full-text availability of journal Éire-Ireland for years 1994-2017, recent releases of public-use Library of Congress MARC XML files, and the publicly-available HathiTrust Digital Library Hathi Files to supplement our understanding of trends in historiography/scholarship in Irish Studies
- Who is being cited? Who is shaping the field (broadly defined)?
- Measures of heterogeneity in citation in particular (are we citing the same people again and again? Is that bad?)
- Are libraries prone to over-collecting certain scholars?
Non-OBJECTIVES
- Focus on scholarly investigation of Ireland -as-subject
- Not a look at what Irish authors were cited or collected (c.f. Brian Lavoie and Lorcan Dempsey, "An Exploration of the Irish Presence in the Published Record," OCLC, 2018)--at least not for library holdings
- However, we can get a sense of heterogeneity of literary subjects and authors via citations in Éire-Ireland
Corpus Overview
- Published by Irish American Cultural Institute (founded 1962), first issue 1966
- St. Paul, MN, to Morristown, NJ, in 1995
- Special themed issues (c. early 2000s on), with guest editors starting 36 1&2 (~28 guest editors, 2000-17)
Éire-Ireland
Corpus Overview
- 23 years of volumes, 1994-2017 (vols. 29-52)
- Yields 11,011 pages of main essay texts:
- 696 Essays/Poems/Translations
- 7,405 pages with footnotes or works cited
- 1.9 million words
- 454 unique authors
- Online publishers:
- Project Muse, JHU (1994 - present)
- EBSCOhost and Gale Cengage (via print subscription publisher; 1998 - present)
Back Archive
- Total number of unique authors: 454
- Average number of appearances: 1.19
- Gender ratio: 282 males (62%) to 172 females (38%)
Corpus Overview
Corpus Prep Workflow
Text Treatment
Corpus Prep Workflow
- Scrape page text from body of plain-text files (due to inferior quality of HTML main text); pull footnotes from HTML
- Marry footnotes to page text, join essay and issue metadata to pages
Text Treatment
Corpus Prep Workflow
- Initial preparation workflow had struggled with detecting start of components within footnotes -- easy to identify start of note, but not subsequent citations in same note.
- Labeling of citation components beyond first two-to-three tokens. In initial 2017 report, only these tokens, where identified as personal names, could be used
Updates, 2017-2019
Corpus Prep Workflow
- Implementation of conditional random fields (CRF) in 2018: create training data in which components of footnotes were labeled (~20 footnotes, or ~700 components)
- "Score" unlabeled footnote components based on token features, e.g.:
- Is it a number? Does it have four digits and start with viable century?
- Is it a delimiter (, ; : .)
- Is it capitalized?
- Does it have quotations marks around it?
- What type of tokens surround it?
Updates, 2017-2019
Corpus Prep Workflow
- Use probabilistic modeling (specifically, Python CRFSuite to assign each microcomponent a label based on its similarity to components of labeled training data.
- Results: some improvement:
- Good in separating journal articles from books
- Enables us to grab subsequent citations in single note number
- But continued problems:
- Separate and identify constituents in multi-author works
- Ignore non-source components, e.g. direct-source quotations being confused for journal article titles (need to implement with further training data)
Updates, 2017-2019
Recap: 2017 Findings
Newspapers & Periodicals
"First-position," full-citation footnotes, name-like tokens
n = 21,850 footnotes
Name | Number |
---|---|
Irish Times | 100 |
Irish Independent | 54 |
Freeman's Journal | 50 |
The Nation | 43 |
United Irishman | 32 |
The Toiler | 32 |
Recap: 2017 Findings
Name | Number |
---|---|
Maria Edgeworth | 30 |
Sean O'Casey | 27 |
Ernie O'Malley | 23 |
Eamon de Valera | 23 |
John Mitchel | 17 |
Seamus Heaney | 14 |
James Joyce |
14 |
Primary Source Names
Name | Number | Average Fn Location |
---|---|---|
Garret FitzGerald | 23 | 70.0 |
David Fitzpatrick | 13 | 42.9 |
Seamus Deane | 13 | 21.23 |
Tom Garvin | 11 | 35.5 |
Terence Brown | 10 | 32.4 |
James Kelly | 9 | 50.3 |
Scholars: By Number of Full-Citation, Head-of-Note
Recap: 2017 Findings
Name | Number | Average Fn Location |
---|---|---|
Edward Said | 4 | 5.75 |
Joel Mokyr | 4 | 6.0 |
Joanna Bourke | 5 | 6.6 |
Michel Foucault | 5 | 7.0 |
Kerby Miller | 4 | 11.5 |
Kevin Whelan | 6 | 12.1 |
Scholars: By Average Placement of First Footnote
Recap: 2017 Findings
Name | Number |
---|---|
The Nation | 83 |
Times (Irish?) | 42 |
Cork Examiner | 39 |
United Irishmen | 31 |
The Toiler | 31 |
Belfast News-Letter | 20 |
Newspapers & Periodicals
CRF-LabelED Data
Citation components labeled "authors"
n = 21,850 footnotes; ~9,000 "authors"
Name | Number |
---|---|
John Mitchel | 51 |
W B Yeats | 41 |
Jonathan Swift | 39 |
Seamus Heaney | 30 |
James Joyce | 26 |
Brendan Behan | 17 |
Subject-Authors
CRF-LabelED Data
Name | Number |
---|---|
Roy Foster | 51 |
Seamus Deane | 45 |
Alvin Jackson | 40 |
Garret FitzGerald | 31 |
David Fitzpatrick | 24 |
FSL Lyons | 20 |
Scholars
CRF-LabelED Data
Library of Congress
- MARC XML "Open Access" Distribution, "Book Files" (see http://loc.gov/cds/products/marcDist.php)
- Number of records: 10 million +
- LoC: This is an undercount
- Extract built on any title field (MARC 245) with "Irish" or "Ireland", n = 18,576
- Considered only scholar-authors here
Library of Congress
Name | Number of Books |
---|---|
Edward MacLysaght | 18 |
Peter Harbison | 18 |
Michael C. O'Laughlin | 17 |
Padraic O'Farrell | 15 |
Morgan Llywelyn | 14 |
Donald Akenson | 14 |
Book Holdings w/ title "Irish" or "Ireland", Scholar-Authors
HathiTrust Digital LIbrary
- Dominated by R1 (especially U Michigan, University of California System) holdings
- Disproportionate representation of pre-1924 holdings (libraries initially afraid to digitize copyright holdings)
- Number of records: 16 million +
- No authors(!) in public use Hathi files
- Extract built on any title field (HT takes from MARC 245) with "Irish" or "Ireland", n = 35,937
Name | # Volumes |
---|---|
Leaders of Public Opinion in Ireland , W.E.H. Lecky (1903) | 12 |
History of Ireland : from the Anglo-Norman invasion till the union of the country with Great Britain, W.C. Taylor (1833) | 8 |
Outlines of the history of Ireland from the earliest times..., P.W. Joyce (1904) | 5 |
The Course of Irish History, ed. Moody/Martin | 5 |
Book Holdings w/ title "Irish" or "Ireland", by Title
HathiTrust Digital LIbrary
Takeaways
- Scholar-authors are not as diverse, citation-wise as library holdings enable.
- Who gets cited is dominated by a shorter list of male historians and literary critics
- ...but, the proportion of footnotes covered by each of these authors is still small (~1.25%)
- Generalist histories predominate, and key edited collections (e.g. Field Day Anthology)
- Over-reliance on Dublin-based national newspapers
- Predominance of author-subjects of usual suspects (Yeats, Swift, Joyce) but also some that overlap with historians' interests (Mitchel)
Tracing Irish Studies through Citations and Library Holdings, ACIS 2019
By Nicholas Wolf
Tracing Irish Studies through Citations and Library Holdings, ACIS 2019
- 1,014