Assessing OCLC Reclamation Results with pymarc

John Dingle, Brock University

Code4Lib North 2019

OCLC Reclamation

LOCAL MARC RECORD

OCLC MATCHING ALGORITHM

OCLC MASTER RECORD

~600,000 records

(Mis)Match

Detection

MARC Records

Quality Assessment

(Mis)Match Detection

Is the OCLC record a true match for our original record?

  • Large percentage difference in title string
  • Difference of > 1 year in 008 date value
  • If present, no matching ISBNs or ISSNs
  • Presence of electronic resource fields in a print record, and vice versa

(Mis)Match Detection

Use pymarc and OCLC Metadata API to compare record fields

Potential mismatches

Results

976 random records, no issues

 

~2% error rate for minimal records

Quality Assessment

How much "better" is the OCLC record than our original?

Quality Assessment

Use pymarc to calculate Thompson-Traill scores for each pair of records

See also: https://github.com/pkiraly/metadata-qa-marc

Thompson, Kelly, and Stacie Traill. “Leveraging Python to Improve Ebook Metadata Selection, Ingest, and Management.” The Code4Lib Journal, no. 38 (2017).

http://journal.code4lib.org/articles/12828.

 

Thompson-Traill Scores

ISBNs

Subject Headings

Additional Authors/Contributors

Description and TOC

RDA fields

Dates

Language and country codes

Results

Local Records = 11.15

OCLC Records = 15.62

 

Potential Applications

Find all 1XX/6XX/7XX fields in a local record that are NOT in the OCLC record

Identify OCLC numbers matched to incorrect format

Find BRX-catalogued records that shouldn't be

Scripts and slides

https://github.com/johnadingle/c4l19_

 

https://slides.com/jdingle/c4l19n

c4l19

By jdingle

c4l19

  • 27
Loading comments...

More from jdingle