radio archives
& the Semantic Web
Anna-Sophia Zingarelli-Sweet
LIS 2975: Digital Scholarship
University of Pittsburgh
December 4, 2013
- Began at UCBerkeley School of Information
- Anne Wootton & Bailey Smith
- Information Management & Systems master's students
- SoundCloud Community Fellowship
- Knight News Challenge: Data category
- Knight Foundation, NEH, PRX, NDSA, Internet Archive
- Launched mid-November 2013
COLLECTING
- Studs Terkel Archive
- Pacifica Radio Archives
- Illinois Public Media
- Center for Investigative Reporting
- The Kitchen Sisters
- Mule Radio Syndicate
- Also open for public upload
features
- automated upload from servers, SoundCloud, etc.
- hypermedia API so clients can automate workflows
- multi-user accounts w/ layered access
Transcription
auto transcription & keywords for core languages
crowdsourced transcription for other languages
automated transcription can be edited with Amara
Preservation & Access
- Partnership w/ Internet Archive
- File format maintenance
- Digital preservation
- Metadata consultation services
- Creation of ontologies
- SEO
- automated keywords
- tagging
- timestamping
Yves Raimond & Tristan Ferne
2013 Semantic Web Challenge
Description & Semantic Web
-
45 years of BBC World Service recordings
- Digitized between 2005 and 2008
-
text about recordings (transcription, metadata)
- connected to Wikipedia & DBPedia
- 20 million RDF triples
-
registered users can then correct & augment
- Used Wikipedia Miner for first pass at structuring text
- "learns from from the structure of links between Wikipedia pages and uses the resulting model to provide a service detecting potential Wikipedia links"
- Used Amazon Web Services cloud
- Use ElasticSearch for search function
- emphasis on usability and design to encourage data curation
Transparency
detailed paper about the algorithms they used to connect their database to Wikipedia and DBpedia
also use GitHub for transparency
specifically details Python program written to disambiguate terms
applications
- augment real time BBC News subtitles with links to archival material
- URIs for individuals (eg Brian Eno) that displays all programs involving him
- could also display networks (eg link to pages of people who collaborated with Brian Eno)