Adam Retter

adam@evolvedbinary.com
 

The National Archives
2020-01-28


@adamretter

Pan-Archival Cataloguing

You look familiar!

  • Implementation Lead for DRI @ TNA

    • Digital Preservation and Archiving

    • Hardware and Software Architecture

    • 2011 - 2014

  • Developer / Consultant

    • XML / XQuery / XSLT / Schema / RelaxNG

    • Scala / Java / C++

    • Concurrency and Scalability

  • Open Source

    • Facebook's RocksDB K/V database

    • Creator of FusionDB Multi-model database

    • Lead developer for eXist-db XML database

Nov 2019: Project Initiation

  • Depend on PROCat/ILDB:

    • SAR (System for Access Regulation)

    • DORIS (Document Ordering and Reader Information Service)

    • DRI (Digital Records Infrastructure)

    • Discovery

  • PROCat/ILDB depend on:

    • SAR

    • Transfers (e.g. Transfer Form / AA2 / Blue Form)

    • HMC (Historical Manuscripts Commission)

  • Predominantly by directly replicating data between databases

Discovery Phase - Integration

Discovery Phase - Integration

Discovery Phase - N Catalogues!

  • HMC (Historical Manuscripts Commission)

    • Authorative for Papers in other archives

    • Authorative for MDR (Manorial Documents Register)

    • Describes 900,000 records

    • Authorative for 20,000 Authority Files

  • MYC (Manage Your Collections)

    • Information about Records held by other archives

    • Mostly non-authorative

    • Authorative for some smaller Archives

    • Describes 10,000,000 records

Discovery Phase - N Catalogues!

  • UK Government Web Archive

    • Authorative for Snapshots of Gov Websites

    • Either "One-Off event" or Accumulation

    • Organised in Collection(s) which have PROCat Series Reference(s)

    • Records for over 5,000 domains (6 billion resources)

  • Digital Surrogate Systems

    • Docs Online

      • Describes 9,000,000 records

    • Record Copying

      • 35,363 orders processed (2020-01-09)

    • Image Library

      • Consisting of 80,000 digital images

Discovery Phase - N Catalogues!

  • Proposing a Pan-Archival Catalogue

    • One Data Model

      • Medium Independent - e.g. Physical and Digital

      • Multiple Arrangements of Records

      • Holistic - e.g. Surrogates, Retained, etc.

      • Extensible for Description

    • One (Logical) Authorative System

      • Reduce duplication and replication of data

      • Reduce inconsistencies across systems

      • Each Record has a persistent and unique identifier

Catalogue Data Model

TNA Cataloguing Standard

  • ISAD(G) Derived Hierachical Arrangement

    • Description is inherited

  • Only 3 Possible (Mono-Hierarchical) Arrangements

TNA Cataloguing Standard

  • Attempted to Model Transfer of Records

DRI Catalogue Model

  • Derived from ISAG(G), PREMIS, and XIP

DRI Catalogue Model

  • Any (Mono-Hierarchical) Arrangement

DRI Catalogue Model

Choosing a New Model

  • Define a Common Vocabularly (for our Conceptual Model)

  • Try not to re-invent any wheels / Square peg vs round hole

  • Analyse Existing Models and Standards, key requirements:

    • Independent of Record Medium

    • Flexibility of Arrangement

      • Non/Mono/Poly-Hierarchical

      • Multiple Arrangements

    • Abstract Record, Concrete Manifestation(s)

      • Redaction

      • Surrogates

    • Provenance

    • Extensible / Open World

Existing Models and Standards

  1. TNA-CS13 (TNA Cataloguing Standard 2013)

  2. DRI Catalogue Model

  3. BIA (Business Information Architecture)

  4. EAD (Encoded Archival Description)

  5. DCAT (Data Catalog Vocabulary)

  6. FRBR (Functional Requirements for Bibliographic Records)

  7. RDA (Resource Description and Access)

  8. BIBFRAME

  9. Europeana

  10. RiC (Records in Context)

  11. Matterhorn RDF Model

Top 3 Models

  1. BIBFRAME Lite + Archive

    • Against - bibliographic/library centric domain

    • Against - single custom ontology

  2. RiC-O

    • For - ICA implementation of RiC-CM

    • Against - single custom ontology

  3. Matterhorn RDF

Project Omega, Next Steps...

  • Define URI for Records

    • RDF Unique Persistent Identifiers. Resolvable ???

    • Unlikely to replace CCR / GCR ???

  • Export data into Matterhorn RDF

    • ILDB

    • SAR

  • Implement (partial) PROCat Replacement

    • Backend is an API fronting the Database

    • Front-end only speaks to API

  • Document our Catalogue Model and Guidelines

  • Stretch goal: import some DRI Born Digital records

Ultimate Goal for World
  Archives Domination:

A linked-data knowledge graph of the entire organisation's assets.

Questions?