Digital Preservation at Dumbarton Oaks

Or How I Stopped Worrying and 
Learned to Love Technology at Work

How to Navigate this Course


Use the down arrow first to get to the end of each section

From there, you can either 
hit the right or down key to move on

Try it now: navigate down to go to the next slide.


Great!

Now you know how to navigate within each section. 

You're ready to move on to the next section -->


What is Digital Preservation?


The active management of digital content 
over time to ensure ongoing access. 

Why?


Just as physical archives, libraries, and museums must undertake steps to preserve and protect their collections, digital repositories must ensure that files remain readable 
so that future generations can 
access the information that they contain.


Why Does This Matter to Me?


As nearly everyone is now working in the digital realm, it is important to work as both an individual and an institution to keep files and their stored information accessible.  These files are often key to maintaining a cultural or institutional memory.  On an individual level, digital preservation also prevents the need for redundant work.

Digital preservation awareness is also especially important to staff that regularly seek outside support from funding agencies.   To ensure that the work that they support will continue to be sustained in the future, these agencies are very aware of digital preservation planning.


What are the challenges in digital preservation?



Media Failure

As anyone who has ever had a computer die or has accidentally dropped an external hard drive knows, digital storage media do not last forever.  In the case of media failure, any information that is not backed up to another location is likely to be lost forever.

The Lifespan of Storage Media

  • CDs/DVDs: 2-20 years(!) - in good conditions
  • Floppy disks: obsolete - most computers no longer have readers
  • USBs: 10,000 writes
  • Hard Drives: 3 years, then 12% annual failure rate
  • External Hard Drives: Less time due to wear and tear

Physical Loss


Physical threats can occur without warning, and can have a huge impact on an institution that is not adequately prepared.  Some particular challenges include:

An example of physical loss by 

natural disaster.


  • Poor storage environment
  • Overuse
  • Infrastructure failure / Inadequate maintenance
  • Hardware failure or malfunction
  • Natural disaster
  • Human error
  • Sabotage

File Format Obsolescence

Often a format can cause preservation challenges, such as:

  • Proprietary software upgrades lead to a new format version, and the old format is no longer fully supported
  • The software supporting a file format is no longer widely available
  • The market is dominated by a new format (e.g., Microsoft Word's .doc/x has made other word processing formats obsolete)
  • The format simply fails on the market, due to any number of reasons (e.g., unfixed bugs, poor usability)

Media Format Obsolescence


As technology evolves, old media - and often their players - are used less and less.  Users want larger storage capacities and smaller media are less likely to continue using old technologies, and eventually migrate to new.  

An example of this is the move from VHS tapes to DVDs, and most recently to digital video.


What YOU Can Do







Ensure that all of your files are stored in an area that is regularly backed up.  Important files should be stored with several redundancies, preferably one of which should be in a separate geographic location.

Migrate Files from Old Media

While it may be important to retain the original file and obsolete media (mainly in cases of digital asset preservation), it is recommended to migrate files from old media.  
These should be moved to a stable storage area 
with redundant copies.

Choose the Right File Format

While there is no perfect format, there are a variety of factors that should be considered as you select the 
format that you will use:

  • Is it proprietary or open source? Proprietary formats often require software purchases, and are not updated as often to fix bugs.  Open source formats are generally supported by freely-available software and are updated by a community of users.
  • Is the format widely used?
  • Are older versions of the format still readable?  This suggests that a format has backward compatibility.
  • Does it support your needs?

Make Your Digital Files Findable


In order to ensure continued access to digital information, it must be located easily.  The first and most important step is to create a system for naming files.  This system should address specific needs, but general suggestions include:

  1. Keep it short.
  2. Add structural elements that will help you locate the file
  3. Only use underscores (_) and hyphens (-)
  4. Order information by importance

Archive Your Email


Email messages often contain pertinent information about departmental and institutional work.  They can contain project- or collection-related information that needs to be preserved.

  • Identify messages that have long-term value
  • Export selected messages- save in a recommended format such as PDF
  • Organize messages for findability- save with related project or collection if possible

Ensure the Security of Your Information

Storage areas that are accessible to multiple users should be regularly monitored in order to prevent any data or file loss.

  • Ensure that no single user has access to all redundant copies of important digital files
  • Immediately remove users after staff turnover
  • Maintain a comprehensive list of users and permissions


Managing Digital Assets


What are Digital Assets?


Digital assets are files of enduring value that should be preserved for the foreseeable future.

Generally, digital assets are items owned by the institution that have no copyright limitations.  Within a cultural heritage institution, these are usually items in the permanent collection, though loaned or donated items with restrictions may be digital assets in certain cases.  They can be either digital surrogates of physical objects or born digital items; the determination should be made by trained staff members.

Digital Assets vs. Everyday Files


Just as you title your files to ensure findability, it is important to also separate the most important files, digital assets, from the everyday files with no enduring value.  This is because digital assets require different digital preservation steps and need to be easily identified.  It is also important to separate them in order to prevent any accidental deletion of important files.

 

Create a Comprehensive Inventory



In order to keep track of your digital assets, it is important to create a comprehensive inventory of the files, their formats, and their locations.
This can be done manually, but there are also specific tools- such as <DROID>- that will carry out the inventory much more quickly.  The inventory should be carried out yearly (or as needed within the institution/department).  
A file-level inventory will assist in carrying out other activities related to digital asset preservation.

Implement File Format Restrictions

To make your digital assets easily manageable, it is important to maintain them in only a few file formats.  By implementing file format restrictions, you can also ensure that your collections follow the best practices suggested for each media type.  For more information, see JISC's <Digital Media guides>.

To determine the best file formats to use, you can also refer to the criteria explained in the previous section.

Implement Format Restrictions

An example of file format restrictions, as defined in a simple TXT document, called a README file, saved within the project folder.

Migrate Files into Selected Formats


For any digital assets that are currently held in a format other than those you have selected for your collections, it is important to establish a format migration plan.  

  

This may take longer than expected, so it is best to plan and execute as quickly as possible.  

Migrate Files into Selected Formats, Cont.

In order to ensure the continued preservation of your collection, it is important to migrate into sustainable formats strategically

Begin with formats that are most at risk of obsolescence (e.g., those that are only supported by outdated software).  Your plan should also address whether the institution will also keep the digital asset in its original format.   In that case, it is recommended to also maintain a copy of the corresponding program and/or technology in order to ensure continued access.

Maintain Redundant Copies

In order to ensure the safety of digital assets, it is important to maintain several copies.  As described in a previous section, data loss is one wrong click or one fried PC away.

As a department or institution, routines should be established to ensure that storage backup schedules fit the needs of the various content creators and managers that work with digital assets.  If possible, redundant copies of digital assets should also be stored in a separate geographic location.

Maintain Redundant Copies

One solution to the necessity of storing multiple copies in geographically diverse locations is to join a consortium of institutions working in digital preservation.

An example of this is LOCKSS, which disburses library-owned digital materials throughout a network, thus ensuring their preservation even in the event of a large-scale disaster.


Check File Fixity


To ensure the continued readability of files, a checksum should be generated on ingest (addition to the repository) of each digital asset.  A checksum is a number representing the sum of digits in a piece of data, against which later comparisons can be made in order to detect any data loss or change.


Potential fixity tools can be found here:
<http://digitalpowrr.niu.edu/tool-grid/>

Check File Fixity, Cont.


Because fixity checking (regenerating an asset's checksum and comparing it to the original to detect any data loss) requires a fair amount of computing power, it should only be carried out at specified intervals.  

It is recommended to undertake fixity checks once per year at most, though three to five years should suffice for most digital assets.  Individual schedules should be developed based on the institution's infrastructure and the digital asset's value and state of preservation.

Metadata

In order to be useable and useful, a digital asset should include as much descriptive information, or metadata, as possible.  This includes the checksum described in the previous slide, as well as information like:


  • original filename, location, and other administrative metadata; 
  • technical metadata describing the file structure; and 
  • copyright information.

Some administrative metadata can be automatically generated using settings on the original device.  Pertinent metadata should be saved redundantly to a project log.

Document Your Storage Plan

In order to ensure that future staff and users can continue to access the digital assets that you are working with, all storage plans and decisions should be documented.  This can be done in a variety of ways, but it is recommended to use popular software like Microsoft Word or Excel in order to ensure readability for future users, but also to make it easy on yourself.

One way to document your storage plan is to create a retention schedule, which defines each digital file type within a collection, the file formats, and their frequency of use.

Document Your Storage Plan, Cont.

An example of a retention schedule.


Retention schedules can also include more detail, depending on the collection and its potential users.


National Digital Stewardship Alliance 

Levels of Preservation


http://www.digitalpreservation.gov/ndsa/activities/levels.html

The NDSA Levels of Preservation

The <levels> are a tiered set of recommendations for how organizations should begin to build or enhance their digital preservation activities.

They allow institutions to assess the level of preservation achieved for specific materials or the entire preservation infrastructure.  The guidelines are organized into five functional areas that are at the heart of digital preservation systems: storage and geographic location, file fixity and data integrity, information security, metadata, and file formats.

Self Assessment Example


 

How Do You Measure Up?


The Current Landscape at Dumbarton Oaks


Storage and Geographic Location

Backups of all networked storage areas are made nightly or monthly, based on the individual drive.  These backups are made manually to LTO tape and stored in another building on the institution's grounds.

There is an external storage media overabundance: there are over 1,500 CDs and DVDs, and even floppy disks are still floating around the institution.  This is likely due to the institution's rapid growth of digital storage needs, without a strategic vision as to how to meet those needs.

Currently, Dumbarton Oaks' storage capacity is somewhere around 25 TB.  This chart indicates that digital storage needs could potentially outgrow the institution's capacity within 3-5 years if non-strategic growth continues.

File Formats


This information comes from a file-level inventory that was undertaken on most of the institution's networked storage in Fall 2013.


There are currently 274 file formats in use around the institution.  While the most-used formats, like TIFF and JPG, are non-proprietary and considered best practices, some of these formats are proprietary and/or outdated.


File Fixity and Data Integrity


There are currently no fixity checks in place in any department.  Based on the Fall 2013 inventory, there are currently 56 unreadable files.  While this is small in comparison to the overall number of formats in use, it suggests that the threat of data loss is very real.

Metadata

There are different standards in use across the institution, most are determined based on established best practices within each field (e.g., museum, archives, libraries).  That said, some do not adhere to field-specific best practices and most are not fully documented in their use.


Information Security


User restrictions are not fully documented, which could contribute to data loss or sabotage.  The system should be regularly culled to remove users after staff turnover.
It is considered best practices to limit one user's access to all copies of a digital asset (i.e., the original creator cannot delete/alter all of the redundant copies), and it seems that this is not currently in place.

Specific Recommendations for Dumbarton Oaks


Back Up Your Data!



Shared (H:/) Drive - Nightly
User (G:/) Drive – Nightly
Email Archive Drive - Nightly
Departmental Drives – Generally Monthly*





*Consult the IT Department for information on specific drives.

Backups In the Cloud...

  • Distributed copies
  • Easy access to files anywhere
  • NOT good for primary long-term preservation storage
  • Regularly weed unnecessary files to prevent overflow
 



Contact webhelp [at] doaks [dot] org for a user account

Sharing on the Intranet


The Dumbarton Oaks Intranet is a great place to store files that you would like to share with colleagues outside of your department, without needing to modify them. However, you should also maintain a hard copy of the file somewhere on your user (G:) or departmental drive for safekeeping. The Intranet can be accessed by authorized users at <http://www.doaks.org/login>.

Examples of items to be uploaded on the Intranet include: finalized versions of policies, presentations, and information sheets about specific collections.

Address File Format Issues


While there are some special projects happening at the institution that require the use of multiple specialized formats (e.g., GIS research), most departments do not need to employ as many file formats as are currently being used.

Each department should review their current practices based on the information provided in previous sections, and consider migrating information into more sustainable formats.

For more information, here are some of the most commonly used formats at DO:

Top Formats at Dumbarton Oaks

Fix File Fixity


Most of the departments at Dumbarton Oaks are working with digital assets to some degree, however there is still no way of automatically checking the long-term fixity of files.


Based on research at the institution, it is recommended that staff in a department with a long-term stake in digital asset preservation (e.g., either library or archives) begin to implement file fixity check creation.  This could be carried out in one project, and then scaled out based on available staff and training.

Standardize Metadata Within Each Department


While metadata standards vary greatly based on the specific field, it is important to choose, adhere to, and document a specific standard (or standards, if necessary) so that future staff and content users can fully understand the system and easily locate items.

Within the chosen metadata standard, it is also important to utilize controlled vocabularies (such as <those created by the Getty>) and to document use of those as well.

Secure Your Data

Information security is truly an institution-wide challenge, and so the IT Department should take the lead in managing user accounts and ensuring that the right users have access to the right items.


However, each department should also work to assist IT.  When there is departmental staff turnover, be sure to inform IT immediately in order to revoke all digital access.

In Sum...









Digital preservation is not a job to be left to the experts.  
It should be part of everyone's daily work, to ensure that the digital objects created today will still be around tomorrow.

 

Image Credit

Most of the images used in this workshop came from the <Digital Preservation Business Case Toolkit>. 

All images are licensed under Creative Commons.

 

Digital Preservation at Dumbarton Oaks

By Heidi Dowding

Digital Preservation at Dumbarton Oaks

Slides adapted from a March 2014 workshop at Dumbarton Oaks.

  • 2,402