Understanding Trove

https://slides.com/wragge/aha-2024

please steal these slides!

https://headlineroulette.net/?id=213102227

2009–2012

2013–2015

2016–

Trove has a history

Single Business Discovery Project

Trove

structures change

https://tdg.glam-workbench.net/what-is-trove/categories-and-zones.html

2008

2024

content changes

2022

2011

Trove is constructed

using critically

context?

content?

https://tdg.glam-workbench.net/

Trove Data Guide content

Explanation – why is Trove like this?
Documentation – what you need to know
How to – complete a specific task
Tutorials – learn methods, develop skills
inspired by Diátaxis

https://ardc.edu.au/services/ardc-community-data-lab/

https://glam-workbench.net/

Community Data Lab

Trove Data Guide

GLAM Workbench

architectures

standards

technologies

principles

context & content

https://tdg.glam-workbench.net/what-is-trove/links-and-identifiers.html

how many digitised newspaper articles are currently in Trove?

https://tdg.glam-workbench.net/understanding-search/search-hacks.html

try it!

go to Trove's newspapers category
enter any keyword (it doesn't matter what it is)
look at the url in your browser's location bar and find the part of the url that looks like:
?keyword=[your keyword]
delete the part after the = sign and hit enter

bonus points!

add &pageSize=100 to the url in your browser's location bar and hit enter
what happens?

official
Trove
hacker

handy with lists

an aggregation of collection metadata
a repository of digitised content
an archive of Australian web content from 1996 onwards
aggregated identity records for people and organisations
born-digital publications submitted via eLegal Deposit
a platform for user engagement
a series of APIs for delivering machine-actionable data

Trove is not one thing...

Trove's categories

starting at the top!

Books & Libraries
Diaries, Letters & Archives
Images, Maps & Artefacts
Lists
Magazines & Newsletters
Music, Audio & Video
Newspapers & Gazettes
People & Organisations
Research & Reports
Websites

separate systems /

specific types of things

Books & Libraries
Diaries, Letters & Archives
Images, Maps & Artefacts
Lists
Magazines & Newsletters
Music, Audio & Video
Newspapers & Gazettes
People & Organisations
Research & Reports
Websites

aggregated metadata
&
digitised resources

Books & Libraries
Diaries, Letters & Archives
Images, Maps & Artefacts
Lists
Magazines & Newsletters
Music, Audio & Video
Newspapers & Gazettes
People & Organisations
Research & Reports
Websites

like newspapers, but not...

formats by category

https://tdg.glam-workbench.net/what-is-trove/categories-and-zones.html

categories are containers

categories are contexts for discovery

Trove is designed for discovery not analysis

works and versions

https://tdg.glam-workbench.net/what-is-trove/works-and-versions.html

https://trove.nla.gov.au/work/158465667

the wrong Wiggles

https://trove.nla.gov.au/work/195172587

one work, 106 different press conferences

https://trove.nla.gov.au/work/10431978

the same, but different....

collection items as 'versions'

https://trove.nla.gov.au/work/163048354

collections within collections

https://tdg.glam-workbench.net/what-is-trove/collections.html

https://nla.gov.au/nla.obj-147116770

https://nla.gov.au/nla.obj-147116890

https://nla.gov.au/nla.obj-140670968

does this matter?

using critically

context?

content?

understanding search

https://tdg.glam-workbench.net/understanding-search/index.html

search is a research method

Understand the technical context — How does it work? Consult the documentation (and this Guide) to understand your options
Be creative and strategic — Solve your puzzle by experimenting and looking for clues in the results
Stay critical — Always assume that Trove isn’t telling you everything

https://tdg.glam-workbench.net/understanding-search/index.html

simple search isn't...

de-fuzzify searches

https://tdg.glam-workbench.net/understanding-search/simple-search-options.html

"isPartOf": [
  {
    "value": "Australian ephemera collection (Programs and invitations)",
    "type": "series"
  }
]

using indexes

https://tdg.glam-workbench.net/what-is-trove/collections.html

search the isPartOf values for "ephemera"

https://tdg.glam-workbench.net/understanding-search/simple-search-options.html

https://tdg.glam-workbench.net/understanding-search/date-searches.html

date searches

what are we searching?

https://tdg.glam-workbench.net/newspapers-and-gazettes/newspaper-corpus.html

change over time

https://wragge.github.io/trove-newspaper-totals/

7,518,764 articles added in 2023

https://updates.timsherratt.org/2024/01/02/trove-newspapers-in.html

https://wragge.github.io/trove-newspaper-totals/

https://troveplaces.herokuapp.com/map/

newspaper locations

what's missing?

https://tdg.glam-workbench.net/newspapers-and-gazettes/newspaper-corpus.html

https://glam-workbench.net/trove-newspapers/Analysing_OCR_corrections/

OCR corrections

https://gist.github.com/wragge/9aa385648cff5f0de0c7d4837896df97

non-English language newspapers

not just newspapers

20,000 books (and ephemera)
900 periodicals containing 37,000 issues
30,000 maps
24,000 Parliamentary Papers
6,000 oral histories
85,000 web page titles
7,000 born-digital periodicals containing 150,000 issues

more than...

where are they?

https://tdg.glam-workbench.net/other-digitised-resources/index.html

try it!

go to the Images, Maps & Artefacts category
search for "nla.obj" (with the quotes)
select 'Online' from the 'Access' facet
add additional keywords or facets!
for example here are digitised posters

books

21,218 'books'
17,695 with OCR
1,473,339 pages

https://tdg.glam-workbench.net/other-digitised-resources/books/overview.html

🔭 explore

periodicals

https://tdg.glam-workbench.net/other-digitised-resources/periodicals/overview.html

908 titles
37,015 issues

🔭 explore

6,202 online
1,781 transcripts
15,107 hours

oral histories

https://tdg.glam-workbench.net/other-digitised-resources/oral-histories/overview.html

🔭 explore

Parliamentary Papers

24,990 publications
2,448,522 pages
4 gb of OCRd text

https://tdg.glam-workbench.net/other-digitised-resources/parliamentary-papers/overview.html

🔭 explore

Finding Parliamentary Papers

https://tdg.glam-workbench.net/other-digitised-resources/parliamentary-papers/finding-pp.html

maps

35,042 'single' maps
30,344 high-res TIFFs
14.41 TB of images
28,205 with coordinates

https://glam-workbench.net/trove-maps/

🔭 explore

NED periodicals

7,973 periodicals
156,151 issues
154,976 PDFs
138,557 full access

🔭 explore

https://glam-workbench.net/trove-journals/harvest-ned-periodicals/

websites

> 8 billion pages
87,757 selected titles
149 subjects
1,920 collections

🔭 explore

https://glam-workbench.net/trove-web-archives/

BREAK

what data?

metadata
text
images
sound
born digital objects
user generated
system statistics

{

what data?

metadata

{

catalogue entries
authority records
library holdings
results of processing (eg OCR coordinates)

text

{

created by OCR / HTR
corrected by users
extracted from web pages
oral history transcripts
titles, abstracts

images

{

created by digitisation (photos, maps, book pages, manuscripts)
born digital (via Flickr)

sound

{

digitised and born digital oral history recordings

born digital

{

web pages (including images, PDFs, videos)
web harvest metadata
ePubs (via legal deposit)

user generated

{

tags
comments
lists
corrections

system stats

{

infer totals from search results
contributors

exploring scale
analysing content
annotation and enrichment
creating collections

beyond Trove's web interface 🚀

why data?

https://glam-workbench.net/trove-newspapers/querypic/

Querypic

19 million articles

https://updates.timsherratt.org/2023/08/08/exploring-the-front.html

https://tdg.glam-workbench.net/pathways/text/newspapers-keywords.html

https://tdg.glam-workbench.net/pathways/images/examples.html

image workspaces

https://wragge.github.io/federation-papers/

try it!

https://tdg.glam-workbench.net/pathways/geospatial/maps-to-ghap.html

https://tdg.glam-workbench.net/pathways/collections/collectionbuilder.html

accessing data

https://tdg.glam-workbench.net/accessing-data/using-web-interface.html

https://www.zotero.org/

data from the web interface

downloading as 'image' delivers an HTML page

limit of 20
backs missing
(no sub-collections)

low resolution (1000px x 1588px)

missing metadata

limited metadata
no full text
< 1 million results

Scaling up?

text from all articles in a newspaper search
all covers from a journal
all images from a finding aid
text from all issues in a journal
all digitised maps of Australia

creating datasets

{

      "id": "61389505",
      "url": "https://api.trove.nla.gov.au/v3/newspaper/61389505",
      "heading": "MR. WRAGGE'S \"WRAGGE.\"",
      "category": "Article",
      "title": {
            "id": "64",
            "title": "Clarence and Richmond Examiner (Grafton, NSW : 1889 - 1915)"
      },
      "date": "1902-07-15",
      "page": "4",
      "pageSequence": "4",
      "troveUrl": "https://nla.gov.au/nla.news-article61389505"

}