Data Management
For Grown Ups
Terrell Russell, Ph.D.
@terrellrussell
Senior Data Scientist, iRODS Consortium
Renaissance Computing Institute (RENCI), UNC-Chapel Hill
iRODS Consortium
The iRODS Consortium was created to ensure the sustainability of iRODS and to further its adoption and continued evolution. To this end, the Consortium works to standardize the definition, development, and release of iRODS-based data middleware technologies, evangelize iRODS among potential users, promote new advances in iRODS, and expand the adoption of iRODS-based data middleware technologies through the development, release, and support of an open-source, mission-critical, production-level distribution of iRODS.
Current Members:
Hard Problems, Today
Data Management
Multiple pieces
Multiple meanings
Multiple goals
Data Management
Data Management
Data Management
Data Management
Data Management
Data Management
Data Management
Data Management
Data Management
People with Keys + Notes/Reports
Passwords + Folders + Scripts (Maybe)
Credentials + Metadata + Automation
Policy Enforcement - Through the Years
Data Management
Fraught with People
Four Verticals → Four Case Studies
Health Care & Life Science
Genomics Use Case - Data begins as series of images from a sequencer, converted to bases (ATCG), fragmented, aligned, annotated for variants, filtered, analyzed
Health Care & Life Science
Priorities:
Oil & Gas
Ingest Use Case - As existing storage fills up, complementary strategies 1) migrate from active to slower, cheaper archive and 2) add more active. Traditional HSM has limited flexibility (access date, physical location, etc.) and additional namespaces just add more complexity.
Oil & Gas
Priorities:
Media & Entertainment
Born Digital Use Case - New valuable creative content (movie assets, original musical tracks) requires large, robust, long-term, flexible, accessible infrastructure.
Media & Entertainment
Priorities:
Archives & Records Management
Provenance Use Case - Libraries, museums, and other cultural institutions have a 100+ year view on their digital assets. Must maintain archival and dissemination copies. Lots of metadata.
Archives & Records Management
Priorities:
Four Verticals → Four Case Studies
The Four Pillars
Open Source Data Management Middleware
Questions?
SC15 Booth #181
irods.org
github.com/irods
@irods
Creative Commons Images Used:
https://www.flickr.com/photos/addieplum/116062198/
https://www.flickr.com/photos/ajmexico/3281139507/
https://www.flickr.com/photos/future15/2037742362/