Dr James Cummings
Adventures in Hosting and Storage
James.Cummings@newcastle.ac.uk
@jamescummings
CC+BY (press space to cycle through slides)
“Endings” Project Symposium on 'Project Resiliency in the Digital Humanities', University of Victoria, 2021
Overview
- Looking at two legacy projects:
- CURSUS: An Online Resource of Medieval Liturgical Texts
- William Godwin’s Diary
- They both had problems, that we can learn from with regards to:
- Hosting problems
- Maintenance problems
- Backup problems
- Long-term storage problems
- Building Resilience: Some common sense lessons
(I've many more examples, but can't fit them into this talk!)
CURSUS:
An Online Resource of Medieval Liturgical Texts
Original URL: http://www.cursus.uea.ac.uk/
Working URL: http://www.cursus.org.uk/
About the Cursus Project
- AHRB-funded project (2000-2003) at University of East Anglia to produce resource of medieval liturgical texts and explore XML publication possibilities
- Principal Investigator Professor David Chadd and Research Assistant Dr James Cummings produced editions of 12 medieval manuscripts
- Desire of research project to investigate and compare order of antiphons, responds, and prayers in these manuscripts which detail order of service in different places in England
- Project produced full XML copies of Corpus Antiphonalium Officii, Vulgate Bible, and other supplementary information
Cursus Project Challenges
- 2000-3 – Main Cursus project completed UEA School of Music
- 2003 – I moved Oxford, project continues with Richard Lewis as developer to keep it running
- 2006 – Sadly, in November 2006 the Principal Investigator Professor David Chadd died
- 2009 – ‘Climategate’ (hacking of emails relating to climate change data) caused UEA to close all off-campus access
- 2010 – Richard and I unable to access departmental server; it is later replaced without Cursus project website.
- 2014 – UEA School of Music is closed.
- 2016 – After 6 years of negotiation I get confirmation of CC+BY+NC license of data, allowing Richard and I to put it up elsewhere
Cursus Project Challenges
-
Hosting problems
- Using departmental server rather than centralised institutional VMs (but these were in short supply 2000-3)
-
Maintenance problems
- Climategate and loss of external SSH access to campus
-
Backup problems
- PI continued to work on his laptop, updates did not get added to the site when he died
-
Long-term storage problems
- TEI P4 XML data was always safe but (until 2016) not stored in open repository, although declared as 'freely available' on original site it had not been explicitly licensed
- That university department had closed made it hard to get 'approval'.
William Godwin's Diary
About the Godwin's Diary Project
- Godwin was a political philosopher and writer, Mary Wollstonecraft’s husband and Mary Shelley’s father
- University of Oxford project (2007-2010) to create digital edition of William Godwin’s Diary with funding for project from Leverhulme Trust
- Diaries purchased with Abinger Collection based on National Heritage Memorial Fund and donations
- 48 years of diaries in 32 octavo notebooks, written in highly abbreviated daily entries
- People’s names often given as initials
- little detail of substance of meetings
- networks of relationships with people, and aggregate lists of information able to extracted from richly encoded TEI
Project work
- I trained PI, RA, and 2 PhD students in TEI in 1.5 days
- But had customised the TEI to be about 15 custom elements total
- These were automatically converted back to 'pure' TEI for display and dissemination
- Bespoke website built on top of early version of
eXist-DB (a native XML Database) - Encoders worked in phases adding structural markup, then meetings, then names, etc.
- Each phase they started with a diary year they had not seen before, proofreading each others work
- As technical consultant I was on hand to answer all and any technical problems
- As to be hosted by the Bodleian, I asked to build it on one of their VMs but was refused
Godwin Diary Project Challenges
-
Hosting problems:
- Not adopted into Bodleian VM infrastructure during project development
- Hosted on old VM infrastructure, ancient version of software which needs occasional restart, potential security problems
-
Maintenance problems:
- No funding direct to Bodleian library to support, only funding/donations to purchase Abinger Collection
- Single developer (me) who continued to support on best-effort basis after project ended in 2010
- Entire Project: Developer, PI, Research Associates, etc. all now at other institutions
- Did not use IIIF (or related standards) but created bespoke pan/zoom image browser using dated Google Maps API
-
Backup problems / Long-term storage problems:
- Until November 2019, underlying data and code not in open repository (now in my github account)
Building Resilience: Problems
- CURSUS: Death of PI then server, Climategate, lack of clear licensing or institutional support
- William Godwin’s Diary: Lack of integration of support by institution, lack of sustainability funding, closed development, departure of staff
Some possible mitigations:
- Cloud hosting -- possible today but wasn't really a thing in 2000-3.
- Remove need to get permission -- clear open licensing from the start
- Work in the light -- now, github or similar would have meant code was always available, use standard technologies
- Lots of copies keep stuff safe -- but also regular releases in github and backups in places like figshare and zenodo
- Plan for the bus factor -- assume PI/Dev/Server going to get hit by a bus, then what happens?
Adventures in Hosting and Storage
By James Cummings
Adventures in Hosting and Storage
"Adventures in Hosting and Storage" -- slides for a video talk given for Project “Endings” Symposium on 'Project Resiliency in the Digital Humanities', University of Victoria, 2021
- 1,512