Data Preservation
July 21-22, 2016
Virginia Tech
Blacksburg, VA
Terrell Russell, Ph.D.
@terrellrussell
Acting Chief Technologist, iRODS Consortium
Backup vs. Replication
Backup | Replication |
---|---|
Prevent Data Loss | Swift Recovery in Disaster |
Snapshot in time | Up to date instance |
Cheap, slow storage | Nearly identical storage |
Many Copies over time | 1-3 instances, up to date |
Recovery from any time | Only up to date instance |
Safe from user error | May also be affected by users |
DR is tested regularly | Replication is kept up to date |
Implementing Backup with iRODS
Goal - provide snapshots in time of collections and data objects in order to provide disaster recovery
Given the requirements, we need:
- Identify collections to back up - e.g. user collections
- A copy of the data in reliable storage
- A timestamp of when the copy is made
- Metadata necessary for complete recovery at the point of the snapshot
Identifying Collections for Backup
Text
Possible Options
- Metadata - tag collections for backup
- Include frequency, priority, tiers of storage
- White List - provide a manifest for collections which are backed up
- Black List - May be easier to identify those that are not
Collection Snapshots
- Use delayed execution rule to identify collections for backup - then push new backup onto a the queue
- Designate backup storage resources
- Collections are not replicated, they are copied - new data_obj_ids, new logical paths and timestamps
- Leverage msiCollRsync
- Possibly use bundle operations to create a bzip-tar archive of collections
- Snapshot frequency is a data management policy - based on data value, storage age, user concerns
- Collection size should be considered - provide a high water mark for over-sized collections
- Lock down permissions - protect from user error
Metadata for Recovery
- Consider a manifest file
- Fully qualified logical paths
- Resources which held the data objects
- Physical paths on resources
- Associated metadata - collections and data objects
- Groups and ACLS for the data objects
- Use additional metadata for discovery
- Time, location, user, project, etc.
Disaster Recovery
Potentially create a tool (rule file) which will
- Mount a structured file - tgz
- Read the manifest file
- Become the target user
- Create collections as necessary
- Apply ACLs to collections as necessary
- Apply metadata to the collection as necessary
- msiRsync the data object to its original logical path
- Apply ACLs to data objects as necessary
- Apply metadata to the data objects as necessary
Virginia Tech - Data Preservation
By iRODS Consortium
Virginia Tech - Data Preservation
- 1,627