Semi-Automated FAST Reconciliation for Hyrax Repositories

 

Senior Developer/Analyst | University of Victoria Libraries   

14 June 2023 | Open Repositories Conference

Link to Slides: uviclibrary.github.io/OR2023Dev

Tiffany Chan

Some Background

Migrated to Hyku (Hyrax-based) from ContentDM

Decided to use OCLC FAST (controlled vocabulary) for several metadata fields

Textual values converted to URIs whenever possible

More details on our metadata migration

Zenzile Miriam Makeba

Makeba, M. (Miriam)

Makeba, Miriam

Textual Values

URI

http://id.worldcat.org/fast/201144

Human-readable

Can be ambiguous, inconsistent

Machine-readable URIs and human-readable labels

Uniquely identifiable

Label

Makeba, Miriam

"subject_tesim":["http://id.worldcat.org/fast/1050538",
  "http://id.worldcat.org/fast/78887",
  "http://id.worldcat.org/fast/2012667",
  "http://id.worldcat.org/fast/1050534"]

"subject_label_tesim":["Painters--Correspondence",
  "Orpen, William, Sir, 1878-1931",
  "Glenavy, Beatrice Moss Campbell, Baroness, 1883-1970",
  "Painters--Biography"

How Hyrax handles URIs/Labels in Metadata

URIs and Labels

Labels

indexed on the backend

displayed in the interface

FAST is constantly evolving...

New URIs are being created

New URIs need to be created

We enter data as textual values first, then submit a request to OCLC to create/assign URIs

Many of our subjects/metadata are of especially local interest, so they don't already exist in FAST

When FAST approves our request, we need to replace the textual values with the new URI(s)

FAST is constantly evolving...

Existing URIs can still change

Examples of Changes

Dates of birth or death are added to the label

Pelé, 1940 -

2022

Headings can be deprecated

or split

Fatigue Testing

Fatigue Testing Machines

Materials Fatigue

URI Maintenance in Hyrax

No easy way to update automatically

Hyrax cannot use the MARC file to update

OCLC doesn't contact you when they add a URI that you requested

Hard to search for the same value in multiple fields using the interface

There is no API (at least not yet)

FAST Updater

Periodic automatic updates via Sidekiq

Download recent Microsoft Excel spreadsheet(s)

(You can specify a date range)

Parse the spreadsheets and categorize changes according to type (New heading, modified heading, obsolete/deprecated, split)

Automatic Changes

New Headings

Search the repository for items with fields containing the textual value, then replace them with the URI

Modified Headings

Search the repository for the URI and reindex affected items, using Hyrax (and associated gems) to fetch and index the updated label

"subject_tesim":[
  "Spreitz, Karl, 1927-",
  "http://id.worldcat.org/fast/837358"
],

"subject_label_tesim":[
  "Spreitz, Karl, 1927-",
  "Boys"
],

New Headings

Before

After

"subject_tesim":[
  "http://id.worldcat.org/fast/2013151",
  "http://id.worldcat.org/fast/837358"
],

"subject_label_tesim":[
  "Spreitz, Karl, 1927-",
  "Boys"
],

Modified Headings

Before

After

"subject_tesim":[
  "http://id.worldcat.org/fast/2013151",
  "http://id.worldcat.org/fast/837358"
],

"subject_label_tesim":[
  "Spreitz, Karl, 1927-",
  "Boys"
],
"subject_tesim":[
  "http://id.worldcat.org/fast/2013151",
  "http://id.worldcat.org/fast/837358"
],

"subject_label_tesim":[
  "Spreitz, Karl, 1927-2016",
  "Boys"
],

Deprecated or Split Headings

Require manual intervention

Email a configured address describing changes and suggestions (if any)

Staff can use a custom interface to search for and replace the old URI with new one(s)

What's Next?

Make it a gem, valkyrize it

FAST Changes API Survey

Convenient way to add or remove URIs/textual values from the repository

Thanks!

Tiffany Chan (tjychan@uvic.ca)

Code: github.com/UVicLibrary/fast_update

Link to slides: uviclibrary.github.io/OR2023Dev

 

Slides made with slides.com / reveal.js

Made with Slides.com