Semi-Automated FAST Reconciliation for Hyrax Repositories
Senior Developer/Analyst | University of Victoria Libraries
14 June 2023 | Open Repositories Conference
Link to Slides: uviclibrary.github.io/OR2023Dev
Tiffany Chan
Some Background
Migrated to Hyku (Hyrax-based) from ContentDM
Decided to use OCLC FAST (controlled vocabulary) for several metadata fields
Textual values converted to URIs whenever possible
Link to Slides: uviclibrary.github.io/OR2023Dev
Zenzile Miriam Makeba
Makeba, M. (Miriam)
Makeba, Miriam
Textual Values
URI
http://id.worldcat.org/fast/201144
Human-readable
Can be ambiguous, inconsistent
Machine-readable URIs and human-readable labels
Uniquely identifiable
Link to Slides: uviclibrary.github.io/OR2023Dev
Label
Makeba, Miriam
"subject_tesim":["http://id.worldcat.org/fast/1050538",
"http://id.worldcat.org/fast/78887",
"http://id.worldcat.org/fast/2012667",
"http://id.worldcat.org/fast/1050534"]
"subject_label_tesim":["Painters--Correspondence",
"Orpen, William, Sir, 1878-1931",
"Glenavy, Beatrice Moss Campbell, Baroness, 1883-1970",
"Painters--Biography"
How Hyrax handles URIs/Labels in Metadata
URIs and Labels
Labels
indexed on the backend
displayed in the interface
FAST is constantly evolving...
New URIs are being created
New URIs need to be created
We enter data as textual values first, then submit a request to OCLC to create/assign URIs
Many of our subjects/metadata are of especially local interest, so they don't already exist in FAST
When FAST approves our request, we need to replace the textual values with the new URI(s)
FAST is constantly evolving...
Existing URIs can still change
Examples of Changes
Dates of birth or death are added to the label
Pelé, 1940 -
2022
Headings can be deprecated
or split
Fatigue Testing
Fatigue Testing Machines
Materials Fatigue
URI Maintenance in Hyrax
No easy way to update automatically
Hyrax cannot use the MARC file to update
OCLC doesn't contact you when they add a URI that you requested
Hard to search for the same value in multiple fields using the interface
There is no API (at least not yet)
FAST Updater
Periodic automatic updates via Sidekiq
Download recent Microsoft Excel spreadsheet(s)
(You can specify a date range)
Parse the spreadsheets and categorize changes according to type (New heading, modified heading, obsolete/deprecated, split)
Automatic Changes
New Headings
Search the repository for items with fields containing the textual value, then replace them with the URI
Modified Headings
Search the repository for the URI and reindex affected items, using Hyrax (and associated gems) to fetch and index the updated label
"subject_tesim":[
"Spreitz, Karl, 1927-",
"http://id.worldcat.org/fast/837358"
],
"subject_label_tesim":[
"Spreitz, Karl, 1927-",
"Boys"
],
New Headings
Before
After
"subject_tesim":[
"http://id.worldcat.org/fast/2013151",
"http://id.worldcat.org/fast/837358"
],
"subject_label_tesim":[
"Spreitz, Karl, 1927-",
"Boys"
],
Modified Headings
Before
After
"subject_tesim":[
"http://id.worldcat.org/fast/2013151",
"http://id.worldcat.org/fast/837358"
],
"subject_label_tesim":[
"Spreitz, Karl, 1927-",
"Boys"
],
"subject_tesim":[
"http://id.worldcat.org/fast/2013151",
"http://id.worldcat.org/fast/837358"
],
"subject_label_tesim":[
"Spreitz, Karl, 1927-2016",
"Boys"
],
Deprecated or Split Headings
Require manual intervention
Email a configured address describing changes and suggestions (if any)
Staff can use a custom interface to search for and replace the old URI with new one(s)
What's Next?
Make it a gem, valkyrize it
FAST Changes API Survey
Convenient way to add or remove URIs/textual values from the repository
Thanks!
Tiffany Chan (tjychan@uvic.ca)
Code: github.com/UVicLibrary/fast_update
Link to slides: uviclibrary.github.io/OR2023Dev
Slides made with slides.com / reveal.js
deck
By tiffchan
deck
- 233