Assessing OCLC Reclamation Results with pymarc
John Dingle, Brock University
Code4Lib North 2019
OCLC Reclamation
LOCAL MARC RECORD
OCLC MATCHING ALGORITHM
OCLC MASTER RECORD
~600,000 records
(Mis)Match
Detection
MARC Records
Quality Assessment
(Mis)Match Detection
Is the OCLC record a true match for our original record?
- Large percentage difference in title string
- Difference of > 1 year in 008 date value
- If present, no matching ISBNs or ISSNs
- Presence of electronic resource fields in a print record, and vice versa
(Mis)Match Detection
Use pymarc and OCLC Metadata API to compare record fields
Potential mismatches
Results
976 random records, no issues
~2% error rate for minimal records
Quality Assessment
How much "better" is the OCLC record than our original?
Quality Assessment
Use pymarc to calculate Thompson-Traill scores for each pair of records
See also: https://github.com/pkiraly/metadata-qa-marc
Thompson, Kelly, and Stacie Traill. “Leveraging Python to Improve Ebook Metadata Selection, Ingest, and Management.” The Code4Lib Journal, no. 38 (2017).
http://journal.code4lib.org/articles/12828.
Thompson-Traill Scores
ISBNs
Subject Headings
Additional Authors/Contributors
Description and TOC
RDA fields
Dates
Language and country codes
Results
Local Records = 11.15
OCLC Records = 15.62
Potential Applications
Find all 1XX/6XX/7XX fields in a local record that are NOT in the OCLC record
Identify OCLC numbers matched to incorrect format
Find BRX-catalogued records that shouldn't be
Scripts and slides
https://github.com/johnadingle/c4l19_
https://slides.com/jdingle/c4l19n
c4l19
By jdingle
c4l19
- 916