DASCH is Broken

But what does that mean?

  • The servers storing the DASCH data were out of warranty and inevitably crashed
  • The servers were replaced, but the scanner and photography station have not been reconnected to the new servers
  • There may be other breaks in the pipeline software

Scanning and photography have stopped

Image Upload

Scanner

DASCH Database

X

Scanning Pipeline

Photography Station

X

  • Inadequate data backups

  • No photography backups

  • No database backups

  • Lossy format photography

  • No archival plan

  • No maintenance plan

  • Closed development practices

  • Limited data export capabilities

  • Unindexed metadata

  • Inefficient database protocols

  • Out of date documentation

  • Limited knowledge transfer

Technical problems unrelated to the servers

We cannot just focus on getting the scanning done

 

We need to get the scanning done
while fixing the bigger problems

Part B:
digital preservation

Part C:
functional capabilities

Part A:
basic operations

  • Inadequate data backups

  • No photography backups

  • No database backups

  • Lossy format photography

  • No archival plan

  • No maintenance plan

  • Closed development practices

  • Limited data export

  • Unindexed metadata

  • Inefficient database protocols

  • Out of date documentation

  • Limited knowledge transfer

proposed three part plan

Part A

step 1: establish administrative connection between the
Library and the Plate Stacks/DASCH

 

  • Long-term responsibility for DASCH and the Plates needs to be explicitly associated with the Library
    • Subsequently need to determine how/if this changes reporting structure and finances

Part A

step 2: re-establish connections between
pipelines and servers

 

  • Scanner pipeline - internal funding needed
    • Research Software Engineering support via FAS RC 
      • De-bug problems from the move to new servers
      • There may be bugs unrelated to the server crash
    • Mechanical Engineering support - consulting with Tony Lowe 
      • We are not confident the scanner will function once software is fixed
      • The rack in the scanning room is not properly ventilated so temperature impacts functionality...

Part A

step 2: re-establish connections between
pipelines and servers

 

  • Photography pipeline
    • Library staff will attempt to de-bug and set up an external storage and backup workaround if necessary
      • Move forward with archival quality imaging
        • non-lossy compression (TIFF vs. JPEG)
      • ​​Internal funding may be required to purchase drives

 

 

Part A

step 3: establish basic integrity of existing systems

 

  • Determine extent of potential data losses from crash
    • Did everything get moved over?
    • Has everything on the rack in the scanning room been uploaded to Cannon?
  • Define backup procedure for photography
    • Why was this never done? Concern about storage?
  • Define backup procedure for scan data
    • When was the last time Ed backed up the data?
    • How do we access the tapes at Iron Mountain?
  • Discuss database backup service with FAS RC
  • Inadequate data backups

  • No photography backups

  • No database backups

  • Lossy format photography

  • No archival plan

  • No maintenance plan

  • Closed development practices

  • Limited data export capabilities

  • Unindexed metadata

  • Inefficient database protocols

  • Out of date documentation

  • Limited knowledge transfer

Part B:
digital preservation

Part C:
data and software engineering

Part A:
basic operations

Part B

Digital Preservation

 

step 1: determine a realistic staffing plan and timeline so it can be incorporated into the development plan

 

  • A minimum we need one FTE to focus on DASCH specifically
    • This person will need at least two years, but DASCH will become a data archive and digital preservation is
      an on-going need
  • Ideally we would have two FTE so one person can focus on DASCH while another person works to develop general services to support data-intensive research

Part B

Digital Preservation

 

step 2: initiate conversations with FAS RC and major astronomy archives about the DASCH data

 

  • FAS RC and HCO's perspectives will be crucial in determining DASCH's financial future and will shape the digital preservation and maintenance strategy
  • If it's clear that the data needs to move we will need to initiate formal discussions with other archives
    • DASCH data is currently not interoperable with systems used by any other archive 

Part B:
digital preservation

Part C:
functional capabilities

Part A:
basic operations

  • Inadequate data backups

  • No photography backups

  • No database backups

  • Lossy format photography

  • No archival plan

  • No maintenance plan

  • Closed development practices

  • Limited data export capabilities

  • Unindexed metadata

  • Inefficient database protocols

  • Out of date documentation

  • Limited knowledge transfer

  • Inadequate data backups

  • No photography backups

  • No database backups

  • Lossy format photography

  • No archival plan

  • No maintenance plan

  • Closed development practices

  • Limited data export capabilities

  • Unindexed metadata

  • Inefficient database protocols

  • Out of date documentation

  • Limited knowledge transfer

Part C

Functional capabilities
 

step 1: API design

  • Enable use of the DASCH data in external analyses
    • DASCH is currently designed in a way that requires scientists to use tools specifically designed to support Josh's research (the light curve generator)
      • This tooling was developed without open protocols and is essentially a black box
      • It severely limits the usefulness of the DASCH data
  • Enable interoperability between DASCH and other systems


We plan on seeking external funding to do this work

Part C

Functional capabilities
 

step 1: API design

  • Determining API use cases will require the input of scientists looking to use DASCH data in their research
    • We should bring on-board a staff scientist to advise on the development plan, prototype tooling, and interface with stakeholders in other domains
    • We would need internal funding for this

 

Part C

Functional capabilities


step 2: Database optimization

  • The databases are inefficient and sprawling
    • There are actually multiple databases running on FAS RC resources
    • Data structure isn't described using Common Workflow Language (CWL), the standard for describing computational data-analysis workflows

We plan on seeking external funding to do this work

Part B:
digital preservation

Part C:
functional capabilities

Part A:
basic operations

  • Inadequate data backups

  • No photography backups

  • No database backups

  • Lossy format photography

  • No archival plan

  • No maintenance plan

  • Closed development practices

  • Limited data export capabilities

  • Unindexed metadata

  • Inefficient database protocols

  • Out of date documentation

  • Limited knowledge transfer

DASCH

Ed Los

DASCH is Broken

By Daina Bouquin

DASCH is Broken

A new path forward with DASCH

  • 111