Dr James Cummings
http://slides.com/jamescummings/behind-the-curtain
Behind the curtain
Probing the inner workings of
digital humanities projects
James.Cummings@newcastle.ac.uk
@jamescummings
CC+BY (press space to cycle through slides)
Overview
- Case studies of (otherwise excellent) projects with some challenges to overcome:
- CatCor: Correspondence of Catherine the Great
- William Godwin’s Diary
- CURSUS: An Online Resource of Medieval Liturgical Texts
- Poetic Forms Online: Renaissance to Modern
- LEAP: Livingstone Online Enhancement and Access Project
- SRO: Stationers Register Online
- ATNU: Animating Text Newcastle University (a new project hoping to learn from these problems)
- Behind the curtain: lessons learned
CatCor:
Correspondence of
Catherine the Great
http://catcor-dev.oucs.ox.ac.uk
(password protected)
About the CatCor Project
- Catherine the Great: Empress of Russia 1762 – 1796, prolific letter writer in Russian, French, German, and English
- University of Oxford internally funded project from a research pump-priming fund to create a proof-of-concept site, hoping to get large research council funding by Professor Ian Kahn and Dr Kelsey Rubin
- A couple hundred letters edited and translated (of around 5000 possible) in TEI
- Detailed editorial links from any person / place / work / event to local metadata about these
- TEI customization, consultation, DOCX to TEI conversion scripts, etc. provided free, web developers charged low rate to produce proof-of-concept site, doing most work in front-end javascript
CatCor Challenges
- Although pilot project was successful, it did not receive full AHRC funding
- Not fully launched, Website behind username / password, access only given to friends of project
- Website code not available since it was stored in private GitHub repository
- Web developers moved on to different projects, no active support for the project
- Lots of bugs which would have been fixed in a full project; No planning for sustainability if not funded
- Did not use new TEI Correspondence <correspDesc> element because this was just under development at the time. (This would have been corrected in a full project)
William
Godwin’s
Diary
About the Godwin's Diary Project
- Godwin was a political philosopher and writer, Mary Wollstonecraft’s husband and Mary Shelley’s father
- University of Oxford project (2007-2010) to create digital edition of William Godwin’s Diary with funding for project from Leverhulme Trust
- Diaries purchased with Abinger Collection based on National Heritage Memorial Fund and donations
- 48 years of diaries in 32 octavo notebooks, written in highly abbreviated daily entries
- People’s names often given as initials
- little detail of substance of meetings
- networks of relationships with people, and aggregate lists of information able to extracted from richly encoded TEI
Godwin Diary Project Challenges
- No funding direct to library to support / host, only funding/donations to purchase Abinger Collection
- Not adopted into Bodleian project infrastructure during project development (even though requested)
- Single developer (me) who continued to support on best-effort basis after project ended in 2010
- Developer, PI, Research Associates, etc. all now at other institutions
- Hosted on old virtual machine infrastructure, software needs occasional restart
- Did not use IIIF (or related standards) for image serving and created bespoke pan/zoom image browser using dated Google Maps API
- XML and images available from site, but website code is not in an open repository
CURSUS:
An Online Resource of Medieval Liturgical Texts
Original URL: http://www.cursus.uea.ac.uk/
Working URL: http://www.cursus.org.uk/
About the Cursus Project
- AHRB-funded project (2000-2003) at University of East Anglia to produce resource of medieval liturgical texts and explore XML publication possibilities
- Principal Investigator Professor David Chadd and Dr James Cummings produced editions of 12 medieval manuscripts
- Desire of research project to investigate and compare order of antiphons, responds, and prayers in these manuscripts which detail order of service in different places in England
- Project produced full copy of Corpus Antiphonalium Officii, Vulgate Bible, and other supplementary information
Cursus Project Challenges
- 2000-3 – Main Cursus project completed
- 2003 – I moved Oxford, project continues with Richard Lewis taking over technical development for 3 years
- 2006 – Sadly, in November 2006 the Principal Investigator Professor David Chadd died
- 2009 – ‘Climategate’ (hacking of emails relating to climate change data) caused UEA to close all off-campus access
- 2010 – Richard and I unable to access server when it went down, later server replaced, website gone.
- 2016 – After 6 years of negotiation I get confirmation of CC+BY+NC license of data, allowing Richard and I to put it up elsewhere
- TEI P4 XML data was safe but (until 2016) not stored in open repository, although freely available on original site it had not been explicitly licensed
Poetic Forms Online:
Renaissance to Modern
About the Poetic Forms Online Project
- University of Oxford minimally funded pilot project by Dr Elizabeth Scott-Baumann and Dr Ben Burton
- Repurposed EEBO-TCP texts converted to TEI P5 XML
- Produced a browsable, searchable, database of verse focusing on poetic form, especially:
- rhyme (including rhyme scheme, rhyme words, rhyme type)
- metre and syllabification
- overall genre
- Starting with Renaissance texts it planned to cover exemplary texts from Renaissance to Modern Day
- In production view of XML, every line is tagged with detailed information about rhyme and metrical structure enabling a powerful faceted search
Poetic Forms Online Challenges
- Proof-of-concept internal pump-priming funding meant limit time/resources/support
- One of the researchers departed to another institution, the other departed from higher education
- Although developers offered to move hosting, no further work has been done on it so only two texts: Shakespeare's Sonnets and Venus and Adonis.
- Given limited funding, team used Drupal for frontend presentation rather than an XML-based solution
- Data not stored in public repository, but private repository owned by individuals in the institution (now departed)
- Use of Drupal Feeds module for reading XML files meant redundant generation of large files duplicating all possible information for every single line
LEAP:
Livingstone Online Enhancement
and Access Project
About the LEAP Project
- Project, led by Dr Adrian Wisnicki (UNL), 2013-2017 to:
- re-develop the Livingstone Online website,
- update all underlying materials to TEI P5 XML under a single TEI customization, and
- produce critical edition of David Livingstone’s final manuscripts (1865-73), including multi-witness texts
- created detailed project documentation, including full TEI P5 ODD customization, information about funding, including project difficulties and lessons learned
- Multi-spectral imaging of difficult to read texts
- All materials released openly, more than just a digital edition, but an archive of all related material including project materials and reports
LEAP Project Challenges
- Planned alpha launch (March 2015) plagued with problems (UCLA developers difficulties in implementing in their chosen solution of Islandora in conjunction with Fedora backend)
- Other project partners did additional work before beta launch, development proceeded in halting fashion, lots of missed deadlines, failed to meet expectations of agreed specification
- After beta launch LEAP team made hard decision to ask UCLA to leave the project, negotiated departure over end of 2015
- LEAP reached agreement for hosting with MITH (Maryland Institute for Technology in the Humanities) at University of Maryland and additional developers
- Transfer of project in 2016 only possible because of detailed documentation of materials, project specifications, and project reports mentioning these problems
- More agile weekly development gave faster pace of feedback given on screenshots
SRO:
Stationers
Register
Online
http://stationersregister.online
(password protected)
About the SRO Project
- A project originally from University of Oxford and Bath Spa University led by Giles Bergel and Ian Gadd to transcribe first three stationers registers (1557 – 1620)
- These are an invaluable sources for english book history and central to the development of copyright.
- They record the right to print from 1557 until modern day
- Minimal funding to create the underlying data in phase 1 (2013) meant keying company made many inconsistencies in creating the TEI files
- A phase 2 project (2016) sought major AHRC funding but was unsuccessful; it scraped together minimal funding from CREATe: the RCUK Centre for Copyright and New Business Models in the Creative Economy
- CREATe also provided in-kind contributions of a developer responsible for creating a new website.
SRO Project Challenges
- SRO has had a number of problems mostly relating to under-funding. Only having a minimal budget means it could not pay for proper quality control in Phase 1.
- The Phase 2 project only employed editors & proofreaders for short period.
- Web development is being provided by CREATe as in-kind contribution by PhD student
- Unfortunately they did not have experience of appropriate technology (eXist-db XML Database) and so did not fully exploit its potential; thus javascript for faceted browsing and slow to use.
- The developer now has got their PhD and so is no longer working for CREATe, donating time pro bono
- The project has pushed back their launch date to summer this year, almost a year overdue
ATNU:
Animating Text
Newcastle University
(A new project hoping to learn
from some of these challenges)
About the ATNU Project
- ATNU is a new project that has just started at Newcastle University trying to learn from some of these mistakes
- It is exploring new frontiers at the cross-roads between traditional scholarly textual editing, digital editing, digital humanities and computer science
- It is involving computer scientists from the very beginning as full partners in the project, not just solution providers.
- It is running several pilot projects across multiple departments and is just as interested in solutions and methodologies that do not work as those which do work.
ATNU Pilots
- Manuscript and Print:
- Digital Edition of the Sarum Hymnal
- Visualization over time of a MS notebook (Shelley)
- Performance:
- ‘Polyphonic Player’
- Early Modern Ballot, with interactive animation
- Text to speech for Early Modern texts
- Translation:
- Differences between Early Modern translations
- Social translation research
- Translation networks
Some Lessons
From Backstage
Lessons: Documentation
- Create detailed internal project documentation and share this openly including documenting all project working practices and assumptions, desired technical specifications, agendas/minutes of meetings
- Always have memorandum of understanding with institutions and other partners (such as developers) with clear milestones and responsibilities on both sides
- Document use of international standards and variation from them (e.g. TEI ODD Customization), technical frameworks, software dependencies
- Make open records of worst-case scenario planning and ensure all partner institutions understand them (e.g. the institution understands them, not just the partner representative)
Lessons: Working In The Light
Projects tend to hide away their work, not wanting to show work-in-progress until it is finished. It is better in the long run if they work in the light, work openly making as many internal project materials available openly to the greater community. Where feasible minimal requirements:
- Always give access to the underlying data
- Pre-license all outputs with open licenses
- Provide data and website code in open repositories (e.g. GitHub)
- Work in inter-institutional collaborative manner, not relying on single institution’s policies but joint agreements
- Give a method for community to provide feedback, improvements, or make derivative works
Lessons: Technical Infrastructure
- Ensure technical decisions use open international standards and popular community supported open source software
- Base development on open documented API (application program interface)
- Do not implement quick workarounds (or if you must, document them in detail)
- Have multiple technical partners overseeing / validating each others work through review of pull-requests, regular reporting
- Integrate into institutional (or multi-institutional) infrastructural support so servers will go on running, be updated, for many years
- If feasible, research software developers should be partners in project, not just solution providers
Happy to answer any questions!
Or later:
james.cummings@newcastle.ac.uk
tw: @jamescummings
http://slides.com/jamescummings/behind-the-curtain
Behind the curtain: probing the inner workings of digital humanities projects
By James Cummings
Behind the curtain: probing the inner workings of digital humanities projects
Behind the curtain: probing the inner workings of digital humanities projects, Talk for REED:London / DH@UGuelph, Thursday 22 March 2018.CC+BY
- 2,001