please steal these slides!

2009–2012

2013–2015

2016–

Trove has a history

Single Business Discovery Project

Trove

structures change

2008

2024

content changes

2022

2011

Trove is constructed

using critically

context?

content?

Trove Data Guide content

  • Explanation – why is Trove like this?
  • Documentation – what you need to know
  • How to – complete a specific task
  • Tutorials learn methods, develop skills
  • inspired by Diátaxis

Community Data Lab

Trove Data Guide

GLAM Workbench

architectures

standards

technologies

principles

context & content

how many digitised newspaper articles are currently in Trove?

try it!

  1. go to Trove's newspapers category
  2. enter any keyword (it doesn't matter what it is)
  3. look at the url in your browser's location bar and find the part of the url that looks like:
    ?keyword=[your keyword]
  4. delete the part after the = sign and hit enter

bonus points!

  1. add &pageSize=100 to the url in your browser's location bar and hit enter
  2. what happens?

official
Trove
hacker

handy with lists

  • an aggregation of collection metadata

  • a repository of digitised content

  • an archive of Australian web content from 1996 onwards

  • aggregated identity records for people and organisations

  • born-digital publications submitted via eLegal Deposit

  • a platform for user engagement

  • a series of APIs for delivering machine-actionable data

Trove is not one thing...

Trove's categories

starting at the top!

  • Books & Libraries

  • Diaries, Letters & Archives

  • Images, Maps & Artefacts

  • Lists

  • Magazines & Newsletters

  • Music, Audio & Video

  • Newspapers & Gazettes

  • People & Organisations

  • Research & Reports

  • Websites

separate systems /

specific types of things

  • Books & Libraries

  • Diaries, Letters & Archives

  • Images, Maps & Artefacts

  • Lists

  • Magazines & Newsletters

  • Music, Audio & Video

  • Newspapers & Gazettes

  • People & Organisations

  • Research & Reports

  • Websites

aggregated metadata
&
digitised resources

  • Books & Libraries

  • Diaries, Letters & Archives

  • Images, Maps & Artefacts

  • Lists

  • Magazines & Newsletters

  • Music, Audio & Video

  • Newspapers & Gazettes

  • People & Organisations

  • Research & Reports

  • Websites

like newspapers, but not...

formats by category

categories are containers

 

categories are contexts for discovery

Trove is designed for discovery not analysis

works and versions

the wrong Wiggles

one work, 106 different press conferences

the same, but different....

collection items as 'versions'

collections within collections

does this matter?

using critically

context?

content?

understanding search

search is a research method

  • Understand the technical context — How does it work? Consult the documentation (and this Guide) to understand your options

  • Be creative and strategic — Solve your puzzle by experimenting and looking for clues in the results

  • Stay critical — Always assume that Trove isn’t telling you everything

simple search isn't...

de-fuzzify searches

"isPartOf": [
  {
    "value": "Australian ephemera collection (Programs and invitations)",
    "type": "series"
  }
]

using indexes

search the isPartOf values for "ephemera"

date searches

what are we searching?

change over time

7,518,764 articles added in 2023

newspaper locations

what's missing?

OCR corrections

non-English language newspapers

not just newspapers

  • 20,000 books (and ephemera)
  • 900 periodicals containing 37,000 issues
  • 30,000 maps
  • 24,000 Parliamentary Papers
  • 6,000 oral histories
  • 85,000 web page titles
  • 7,000 born-digital periodicals containing 150,000 issues

more than...

where are they?

try it!

books

  • 21,218 'books'
  • 17,695 with OCR
  • 1,473,339 pages

periodicals

  • 908 titles
  • 37,015 issues
  • 6,202 online
  • 1,781 transcripts
  • 15,107 hours

oral histories

Parliamentary Papers

  • 24,990 publications
  • 2,448,522 pages
  • 4 gb of OCRd text

Finding Parliamentary Papers

maps

  • 35,042 'single' maps
  • 30,344 high-res TIFFs
  • 14.41 TB of images
  • 28,205 with coordinates

NED periodicals

  • 7,973 periodicals
  • 156,151 issues
  • 154,976 PDFs
  • 138,557 full access

🔭 explore

websites

  • > 8 billion pages
  • 87,757 selected titles
  • 149 subjects
  • 1,920 collections

🔭 explore

BREAK

what data?

  • metadata
  • text
  • images
  • sound
  • born digital objects
  • user generated
  • system statistics

{

what data?

metadata

{

  • catalogue entries
  • authority records
  • library holdings
  • results of processing (eg OCR coordinates)

text

{

  • created by OCR / HTR
  • corrected by users
  • extracted from web pages
  • oral history transcripts
  • titles, abstracts

images

{

  • created by digitisation  (photos, maps, book pages, manuscripts)
  • born digital (via Flickr)

sound

{

  • digitised and born digital oral history recordings

born digital

{

  • web pages  (including images, PDFs, videos)
  • web harvest metadata
  • ePubs (via legal deposit)

user generated

{

  • tags
  • comments
  • lists
  • corrections

system stats

{

  • infer totals from search results
  • contributors
  • exploring scale
  • analysing content
  • annotation and enrichment
  • creating collections

beyond Trove's web interface 🚀

why data?

Querypic

19 million articles

image workspaces

try it!

accessing data

data from the web interface

downloading as 'image' delivers an HTML page

  • limit of 20

  • backs missing
    (no sub-collections)

low resolution (1000px x 1588px)

missing metadata

  • limited metadata
  • no full text
  • < 1 million results

Scaling up?

  • text from all articles in a newspaper search
  • all covers from a journal
  • all images from a finding aid
  • text from all issues in a journal
  • all digitised maps of Australia

 

creating datasets

{

      "id": "61389505",
      "url": "https://api.trove.nla.gov.au/v3/newspaper/61389505",
      "heading": "MR. WRAGGE'S \"WRAGGE.\"",
      "category": "Article",
      "title": {
            "id": "64",
            "title": "Clarence and Richmond Examiner (Grafton, NSW : 1889 - 1915)"
      },
      "date": "1902-07-15",
      "page": "4",
      "pageSequence": "4",
      "troveUrl": "https://nla.gov.au/nla.news-article61389505"

}

Trove API

use the API, but...

  • skills?
  • documentation?
  • examples?
  • tools?

🤯

API limits

  • items in a digitised collection
  • links to download images
  • text from books or periodical issues
  • text from Australian Women's Weekly

🐞 and bugs!

🪠 data plumbing

gaps & blockages

researchers 👩‍🔧

researchers 👩‍🔧

collaboration

articles as images

try it!

  1. find a newspaper article (here's one)
  2. copy the url of the newspaper article from your browser's location bar
  3. go to the newspaper image app in the GLAM Workbench
  4. paste the url into the box
  5. click on Get images
  6. select 'Save image as...'
    from right click menu

check 'mask image' and try again – what changes?

save
high-res images

try it!

  1. search for a digitised photo (remember the "nla.obj" trick)
  2. here's an example if you can't think of anything to search for
  3. click on the View link to load the image in the digitised item viewer
  4. change the url suffix from /view to /image in your browser's location bar

  5. click enter to view the image

  6. choose 'Save image as...' from the right click menu to download the image

harvest newspaper articles

text from books & periodicals

images in a collection

illustrations from books & periodicals

maps by location

archived websites by topic

harvesting digitised resources

more data sources

🔭 explore

  • 3,471 cartoons
  • 1886 to 1962

you can help!

add notes, tags, links....

add ideas!

report problems!

share everything!

  • openly licensed to encourage reuse
  • use for your own research
  • use in teaching & research training
  • no need to ask for permission

steal everything!

pay it forward!

stay connected!

stay connected!

Tim Sherratt

 

email: tim@timsherratt.au

web: timsherratt.au

mastodon: @wragge@hcommons.social

updates: https://updates.timsherratt.org/

Understanding Trove workshop

By Tim Sherratt

Understanding Trove workshop

Presented at the Australian Historical Association Annual Conference, 1 July 2024

  • 853