Dr James Cummings
 

Adventures in Hosting and Storage

James.Cummings@newcastle.ac.uk

@jamescummings

CC+BY   (press space to cycle through slides)

 “Endings” Project Symposium on 'Project Resiliency in the Digital Humanities', University of Victoria, 2021

Overview

  • Looking at two legacy projects:
    • CURSUS: An Online Resource of Medieval Liturgical Texts
    • William Godwin’s Diary
  • They both had problems, that we can learn from with regards to:
    • Hosting problems
    • Maintenance problems
    • Backup problems
    • Long-term storage problems
  • Building Resilience: Some common sense lessons

(I've many more examples, but can't fit them into this talk!)

CURSUS:
An Online Resource of Medieval Liturgical Texts

About the Cursus Project

  • AHRB-funded project (2000-2003) at University of East Anglia to produce resource of medieval liturgical texts and explore XML publication possibilities
  • Principal Investigator Professor David Chadd and Research Assistant Dr James Cummings produced editions of 12 medieval manuscripts
  • Desire of research project to investigate and compare order of antiphons, responds, and prayers in these manuscripts which detail order of service in different places in England
  • Project produced full XML copies of Corpus Antiphonalium Officii, Vulgate Bible, and other supplementary information 

Cursus Project Challenges

  • 2000-3 – Main Cursus project completed UEA School of Music
  • 2003 – I moved Oxford, project continues with Richard Lewis as developer to keep it running
  • 2006 – Sadly, in November 2006 the Principal Investigator Professor David Chadd died
  • 2009 – ‘Climategate’ (hacking of emails relating to climate change data) caused UEA to close all off-campus access
  • 2010 – Richard and I unable to access departmental server; it is later replaced without  Cursus project website.
  • 2014  UEA School of Music is closed.
  • 2016 – After 6 years of negotiation I get confirmation of CC+BY+NC license of data, allowing Richard and I to put it up elsewhere

Cursus Project Challenges

  • Hosting problems
    • Using departmental server rather than centralised institutional VMs (but these were in short supply 2000-3)
  • Maintenance problems
    • Climategate and loss of external SSH access to campus
  • Backup problems
    • PI continued to work on his laptop, updates did not get added to the site when he died
  • Long-term storage problems
    • TEI P4 XML data was always safe but (until 2016) not stored in open repository, although declared as 'freely available' on original site it had not been explicitly licensed
    • That university department had closed made it hard to get 'approval'.

William Godwin's Diary

About the Godwin's Diary Project

  • Godwin was a political philosopher and writer, Mary Wollstonecraft’s husband and Mary Shelley’s father
  • University of Oxford project (2007-2010) to create digital edition of William Godwin’s Diary with funding for project from Leverhulme Trust
  • Diaries purchased with Abinger Collection based on National Heritage Memorial Fund and donations
  • 48 years of diaries in 32 octavo notebooks, written in highly abbreviated daily entries
    • People’s names often given as initials
    • little detail of substance of meetings
    • networks of relationships with people, and aggregate lists of information able to extracted from richly encoded TEI

Project work

  • I trained PI, RA, and 2 PhD students in TEI in 1.5 days
    • But had customised the TEI to be about 15 custom elements total
    • These were automatically converted back to 'pure' TEI for display and dissemination
  • Bespoke website built on top of early version of
    eXist-DB (a native XML Database)
  • Encoders worked in phases adding structural markup, then meetings, then names, etc. 
  • Each phase they started with a diary year they had not seen before, proofreading each others work
  • As technical consultant I was on hand to answer all and any technical problems
  • As to be hosted by the Bodleian, I asked to build it on one of their VMs but was refused

Godwin Diary Project Challenges

  • Hosting problems:
    • Not adopted into Bodleian VM infrastructure during project development
    • Hosted on old VM infrastructure, ancient version of software which needs occasional restart, potential security problems
  • Maintenance problems:
    • No funding direct to Bodleian library to support, only funding/donations to purchase Abinger Collection
    • Single developer (me) who continued to support on best-effort basis after project ended in 2010
    • Entire Project: Developer, PI, Research Associates, etc. all now at other institutions
    • Did not use IIIF (or related standards) but created bespoke pan/zoom image browser using dated Google Maps API
  • Backup problems / Long-term storage problems:
    • Until November 2019, underlying data and code not in open repository (now in my github account)

Building Resilience: Problems

  • CURSUS: Death of PI then server, Climategate, lack of clear licensing or institutional support
  • William Godwin’s Diary: Lack of integration of support by institution, lack of sustainability funding, closed development, departure of staff

Some possible mitigations:

  • Cloud hosting -- possible today but wasn't really a thing in 2000-3. 
  • Remove need to get permission -- clear open licensing from the start
  • Work in the light -- now, github or similar would have meant code was always available, use standard technologies 
  • Lots of copies keep stuff safe -- but also regular releases in github and backups in places like figshare and zenodo
  • Plan for the bus factor -- assume PI/Dev/Server going to get hit by a bus, then what happens?