Digital Preservation

Or How I Stopped Worrying and 
Learned to Love Technology at Work

How to Navigate this Course


Use the down arrow first to get to the end of each section

From there, you can either 
hit the right or down key to move on

Try it now: navigate down to go to the next slide.


Great!

Now you know how to navigate within each section. 

You're ready to move on to the next section -->


What is Digital Preservation?


The active management of digital content 
over time to ensure ongoing access. 

Why?


Just as physical archives, libraries, and museums must undertake steps to preserve and protect their collections, digital repositories must ensure that files remain readable 
so that future generations can 
access the information that they contain.


Why Does This Matter to Me?


As nearly everyone is now working in the digital realm, it is important to work as both an individual and an institution to keep files and their stored information accessible.  These files are often key to maintaining a cultural or institutional memory.  On an individual level, digital preservation also prevents the need for redundant work.

Digital preservation awareness is also especially important to staff that regularly seek outside support from funding agencies.   To ensure that the work that they support will continue to be sustained in the future, these agencies are very aware of digital preservation planning.


What are the challenges in digital preservation?



Media Failure

As anyone who has ever had a computer die or has accidentally dropped an external hard drive knows, digital storage media do not last forever.  In the case of media failure, any information that is not backed up to another location is likely to be lost forever.

The Lifespan of Storage Media

  • CDs/DVDs: 2-20 years(!) - in good conditions
  • Floppy disks: obsolete - most computers no longer have readers
  • USBs: 10,000 writes
  • Hard Drives: 3 years, then 12% annual failure rate
  • External Hard Drives: Less time due to wear and tear

Physical Loss


Physical threats can occur without warning, and can have a huge impact on an institution that is not adequately prepared.  Some particular challenges include:

An example of physical loss by 

natural disaster.


  • Poor storage environment
  • Overuse
  • Infrastructure failure / Inadequate maintenance
  • Hardware failure or malfunction
  • Natural disaster
  • Human error
  • Sabotage

File Format Obsolescence

Often a format can cause preservation challenges, such as:

  • Proprietary software upgrades lead to a new format version, and the old format is no longer fully supported
  • The software supporting a file format is no longer widely available
  • The market is dominated by a new format (e.g., Microsoft Word's .doc/x has made other word processing formats obsolete)
  • The format simply fails on the market, due to any number of reasons (e.g., unfixed bugs, poor usability)

Media Format Obsolescence


As technology evolves, old media - and often their players - are used less and less.  Users want larger storage capacities and smaller media are less likely to continue using old technologies, and eventually migrate to new.  

An example of this is the move from VHS tapes to DVDs, and most recently to digital video.


What YOU Can Do







Ensure that all of your files are stored in an area that is regularly backed up.  Important files should be stored with several redundancies, preferably one of which should be in a separate geographic location.

Migrate Files from Old Media

While it may be important to retain the original file and obsolete media (mainly in cases of digital asset preservation), it is recommended to migrate files from old media.  
These should be moved to a stable storage area 
with redundant copies.

Choose the Right File Format

While there is no perfect selection, there are a variety of factors that should be considered as you select the 
format that you will use:

  • Is it proprietary or open source? Proprietary formats often require software purchases, and are not updated as often to fix bugs.  Open source formats are generally supported by freely-available software and are updated by a community of users.
  • Is the format widely used?
  • Are older versions of the format still readable?  This suggests that a format has backward compatibility.
  • Does it support your needs?

Make Your Digital Files Findable


In order to ensure continued access to digital information, it must be located easily.  The first and most important step is to create a system for naming files.  This system should address specific needs, but general suggestions include:

  1. Keep it short.
  2. Add structural elements that will help you locate the file
  3. Only use underscores (_) and hyphens (-)
  4. Order information by importance

Archive Your Email


Email messages often contain pertinent information about departmental and institutional work.  They can contain project- or collection-related information that needs to be preserved.

  • Identify messages that have long-term value
  • Export selected messages- save in a recommended format such as PDF
  • Organize messages for findability- save with related project or collection if possible

Ensure the Security of Your Information

Storage areas that are accessible to multiple users should be regularly monitored in order to prevent any data or file loss.

  • Ensure that no single user has access to all redundant copies of important digital files
  • Immediately remove users after staff turnover
  • Maintain a comprehensive list of users and their permissions


Managing Digital Assets


What are Digital Assets?


Digital assets are files of enduring value that should be preserved for the foreseeable future.

Generally, digital assets are items that are owned by the institution, and have no copyright limitations.  Within a cultural heritage institution, these are usually items in their permanent collection, though loaned or donated items with restrictions may be digital assets in certain cases.  Digital assets can be either digital surrogates of physical objects or born digital items; the determination should be made by trained staff members.

Digital Assets vs. Everyday Files


Just as you title your files to ensure findability, it is important to also separate the most important files, or digital assets, from the everyday files with no enduring value.  This is because digital assets require different digital preservation steps and need to be easily identified.  It is also important to separate them in order to prevent any accidental deletion of important files.

 

Create a Comprehensive Inventory



In order to keep track of your digital assets, it is important to create a comprehensive inventory of the files, their formats, and their locations.
This can be done manually, but there are specific tools, such as DROID, that will carry out the inventory much quicker.  The inventory should be carried out yearly (or as needed within the institution/department).  
A file-level inventory will assist in carrying out other activities related to digital asset preservation.

Implement File Format Restrictions

To make your digital assets easily manageable, it is important to maintain them in only a few file formats.  By implementing file format restrictions, you can also ensure that your collections follow the best practices suggested for each media type.  For more information, see JISC's <Digital Media guides>.

To determine the best file formats to use, you can also refer to the criteria explained in the previous section.

Implement Format Restrictions

An example of file format restrictions, as defined in a simple TXT document, called a README file, saved within the project folder.

Migrate Files into Selected Formats


For any digital assets that are currently held in a format other than those you have selected for your collections, it is important to establish a format migration plan.  

  

This may take longer than expected, so it is best to plan and execute as quickly as possible.  

Migrate Files into Selected Formats, Cont.

In order to ensure the continued preservation of your collection, it is important to migrate into sustainable formats strategically

Begin with formats that are most at risk of obsolescence (e.g., those that are only supported by outdated software).  Your plan should also address whether the institution will also keep the digital asset in its original format.   In that case, it is recommended to also maintain a copy of the corresponding program and/or technology in order to ensure continued access.

Maintain Redundant Copies

In order to ensure the safety of digital assets, it is important to maintain several copies.  As described in a previous section, data loss is one wrong click or one fried PC away.

As a department or institution, routines should be established to ensure that storage backup schedules fit the needs of the various content creators and managers that work with digital assets.  If possible, redundant copies of digital assets should also be stored in a separate geographic location.

Maintain Redundant Copies

One solution to the necessity of storing multiple copies in geographically diverse locations is to join a consortium of institutions working in digital preservation.

An example of this is LOCKSS, which disburses library-owned digital materials throughout a network, thus ensuring their preservation even in the event of a large-scale disaster.


Check File Fixity


To ensure the continued readability of files, a checksum should be generated on ingest (addition to the repository) of each digital asset.  A checksum is a number representing the sum of digits in a piece of data, against which later comparisons can be made in order to detect any data loss or change.


Potential fixity tools can be found here:
<http://digitalpowrr.niu.edu/tool-grid/>

Check File Fixity, Cont.


Because fixity checking (regenerating an asset's checksum and comparing it to the original to detect any data loss) requires a fair amount of computing power, it should only be carried out at specified intervals.  

It is recommended to undertake fixity checks once per year at most, though three to five years should suffice for most digital assets.  Individual schedules should be developed based on the institution's infrastructure and the digital asset's value and state of preservation.

Metadata

In order to be useable and useful, a digital asset should include as much descriptive information, or metadata, as possible.  This includes the checksum described in the previous slide, as well as information like:


  • original filename, location, and other administrative metadata; 
  • technical metadata describing the file structure; and 
  • copyright information.

Some administrative metadata can be automatically generated using settings on the original device.  Pertinent metadata should be saved redundantly to a project log.

Document Your Storage Plan

In order to ensure that future staff and users can continue to access the digital assets that you are working with, all storage plans and decisions should be documented.  This can be done in a variety of ways, but it is recommended to use popular software like Microsoft Word or Excel in order to ensure readability for future users, but also to make it easy on yourself.

One way to document your storage plan is to create a retention schedule, which defines each digital file type within a collection, the file formats, and their frequency of use.

Document Your Storage Plan, Cont.

An example of a retention schedule.


Retention schedules can also include more detail, depending on the collection and its potential users.


National Digital Stewardship Alliance 

Levels of Preservation


http://www.digitalpreservation.gov/ndsa/activities/levels.html

The NDSA Levels of Preservation

The <levels> are a tiered set of recommendations for how organizations should begin to build or enhance their digital preservation activities.

They allow institutions to assess the level of preservation achieved for specific materials or the entire preservation infrastructure.  The guidelines are organized into five functional areas that are at the heart of digital preservation systems: storage and geographic location, file fixity and data integrity, information security, metadata, and file formats.

Self Assessment Example


 

How Do You Measure Up?


In Sum...









Digital preservation is not a job to be left to the experts.  
It should be part of everyone's daily work, to ensure that the digital objects created today will still be around tomorrow.

Image Credit

Most of the images used in this workshop came from the <Digital Preservation Business Case Toolkit>.

All images are licensed under Creative Commons.

 

Workshop on Digital Preservation

By Heidi Dowding

Workshop on Digital Preservation

Slides adapted from a March 2014 workshop at Dumbarton Oaks.

  • 3,035