Data Management Design Patterns
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium

Data Management Design Patterns
November 14-16, 2017
Supercomputing 2017
Denver, CO



Why Data Management Matters

As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.


iRODS is
- Open Source
- Distributed
- Data Centric
- Metadata Driven
A flexible framework for the abstraction of infrastructure




iRODS as the Integration Layer


iRODS Build and Test - Today

Spring 2015 - onwards
- Jenkins → Python → Ansible → zone_bundles → vSphere dynamic VMs
Changes Since 2017
- Centos 6 and Ubuntu 12 no more supported
-
irods build logic moved out of ansible
-
workflow to test all plugins
-
run-script-on-vms
- run-script-on-irods-zone
History

20+ year legacy
- 10 years of federal funding for grid storage research
- 10 years of federal funding for policy engine research
- iRODS Consortium founded in 2013
Our Membership















Community Driven

Input from the Open Source Community
- Support Requests
- Community Feedback
- Working Groups
- Use Cases
- Proofs of Concept
All with the Expectation of Public Discourse and Disclosure
Discovered a common enabling practice...

(aka metadata)
Annotation with meaning

Annotation is both descriptive and prescriptive.
It is useful
- for discovery of the past and the present
- to direct the future
Metadata Everywhere



With the appropriate abstractions, everything in a system can be described with metadata and therefore, all actions within a system can be driven by that metadata.

Metadata Driven Patterns:
- Good Metadata (Templates)
- Landing Zone / Ingest
- Replication
- Tiering
- Archiving
- Auditing
- Data to Compute
- Compute to Data
Metadata Templates


iRODS Capabilities


From Prototype to Production


Provenance and Reporting



Data to Compute Pattern


Compute to Data Pattern



An open community-driven process
- is hard
- is slow
But, it also
- sets clear expectations
- generates a shared language
- produces a strong culture
- produces a better 'product'
- is worth it
Discovering Design Patterns

Thank you!
iRODS Consortium @iRODS
RENCI @RENCI
Booth #437
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
SC17 - Data Management Design Patterns
By iRODS Consortium
SC17 - Data Management Design Patterns
- 2,517