radio archives

& the Semantic Web

Anna-Sophia Zingarelli-Sweet
LIS 2975: Digital Scholarship
University of Pittsburgh
December 4, 2013

POPUP ARCHIVE

  • Began at UCBerkeley School of Information
  • Anne Wootton & Bailey Smith
  • Information Management & Systems master's students
        • SoundCloud Community Fellowship
        • Knight News Challenge: Data category


  • Knight Foundation, NEH, PRX, NDSA, Internet Archive
  • Launched mid-November 2013

COLLECTING


            • Studs Terkel Archive
            • Pacifica Radio Archives
            • Illinois Public Media
            • Center for Investigative Reporting
            • The Kitchen Sisters
            • Mule Radio Syndicate
            • Also open for public upload

features


  • automated upload from servers, SoundCloud, etc.
  • hypermedia API so clients can automate workflows
  • multi-user accounts w/ layered access

Transcription


  • auto transcription & keywords for core languages
  • crowdsourced transcription for other languages
  • automated transcription can be edited with Amara
  • Preservation & Access


    • Partnership w/ Internet Archive
    • File format maintenance
    • Digital preservation

              • Metadata consultation services
              • Creation of ontologies
              • SEO
              • automated keywords
              • tagging
              • timestamping

    BBC World Service Archive Prototype


    Yves Raimond & Tristan Ferne


    2013 Semantic Web Challenge

    Description & Semantic Web


    • 45 years of BBC World Service recordings
    • Digitized between 2005 and 2008
    • text about recordings (transcription, metadata)
    • connected to Wikipedia & DBPedia
    • 20 million RDF triples
    • registered users can then correct & augment


    • Used Wikipedia Miner for first pass at structuring text
      • "learns from from the structure of links between Wikipedia pages and uses the resulting model to provide a service detecting potential Wikipedia links"
    • Used Amazon Web Services cloud 
    • Use ElasticSearch for search function
    • emphasis on usability and design to encourage data curation

    Transparency


  • detailed paper about the algorithms they used to connect their database to Wikipedia and DBpedia
  • also use GitHub for transparency
  • specifically details Python program written to disambiguate terms
  • applications


    • augment real time BBC News subtitles with links to archival material

    • URIs for individuals (eg Brian Eno) that displays all programs involving him

    • could also display networks (eg link to pages of people who collaborated with Brian Eno)
    Made with Slides.com