Data Management Design Patterns
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
Data Management Design Patterns
November 14-16, 2017
Supercomputing 2017
Denver, CO
Why Data Management Matters
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
iRODS is
- Open Source
- Distributed
- Data Centric
- Metadata Driven
A flexible framework for the abstraction of infrastructure
iRODS as the Integration Layer
iRODS Build and Test - Today
Spring 2015 - onwards
- Jenkins → Python → Ansible → zone_bundles → vSphere dynamic VMs
Changes Since 2017
- Centos 6 and Ubuntu 12 no more supported
-
irods build logic moved out of ansible
-
workflow to test all plugins
-
run-script-on-vms
- run-script-on-irods-zone
History
20+ year legacy
- 10 years of federal funding for grid storage research
- 10 years of federal funding for policy engine research
- iRODS Consortium founded in 2013
Our Membership
Community Driven
Input from the Open Source Community
- Support Requests
- Community Feedback
- Working Groups
- Use Cases
- Proofs of Concept
All with the Expectation of Public Discourse and Disclosure
Discovered a common enabling practice...
(aka metadata)
Annotation with meaning
Annotation is both descriptive and prescriptive.
It is useful
- for discovery of the past and the present
- to direct the future
Metadata Everywhere
With the appropriate abstractions, everything in a system can be described with metadata and therefore, all actions within a system can be driven by that metadata.
Metadata Driven Patterns:
- Good Metadata (Templates)
- Landing Zone / Ingest
- Replication
- Tiering
- Archiving
- Auditing
- Data to Compute
- Compute to Data
Metadata Templates
iRODS Capabilities
From Prototype to Production
Provenance and Reporting
Data to Compute Pattern
Compute to Data Pattern
An open community-driven process
- is hard
- is slow
But, it also
- sets clear expectations
- generates a shared language
- produces a strong culture
- produces a better 'product'
- is worth it
Discovering Design Patterns
Thank you!
iRODS Consortium @iRODS
RENCI @RENCI
Booth #437
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
SC17 - Data Management Design Patterns
By iRODS Consortium
SC17 - Data Management Design Patterns
- 2,464