Data Management Design Patterns

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

Data Management Design Patterns

November 14-16, 2017

Supercomputing 2017

Denver, CO

Why Data Management Matters

As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.

iRODS is

  • Open Source
  • Distributed
  • Data Centric
  • Metadata Driven

 

A flexible framework for the abstraction of infrastructure

iRODS as the Integration Layer

iRODS Build and Test - Today

Spring 2015 - onwards

  • Jenkins → Python → Ansible zone_bundles → vSphere dynamic VMs

 

Changes Since 2017

  • Centos 6 and Ubuntu 12 no more supported
  • irods build logic moved out of ansible

  • workflow to test all plugins

  • run-script-on-vms

  • run-script-on-irods-zone

History

20+ year legacy

  • 10 years of federal funding for grid storage research
  • 10 years of federal funding for policy engine research
  • iRODS Consortium founded in 2013

Our Membership

Community Driven

Input from the Open Source Community

  • Support Requests
  • Community Feedback
  • Working Groups
  • Use Cases
  • Proofs of Concept

 

All with the Expectation of Public Discourse and Disclosure

Discovered a common enabling practice...

(aka metadata)

Annotation with meaning

Annotation is both descriptive and prescriptive.

 

It is useful

  • for discovery of the past and the present
  • to direct the future

 

Metadata Everywhere

With the appropriate abstractions, everything in a system can be described with metadata and therefore, all actions within a system can be driven by that metadata.

 

Metadata Driven Patterns:

  • Good Metadata (Templates)
  • Landing Zone / Ingest
  • Replication
  • Tiering
  • Archiving
  • Auditing
  • Data to Compute
  • Compute to Data

 

Metadata Templates

iRODS Capabilities

From Prototype to Production

Provenance and Reporting

Data to Compute Pattern

Compute to Data Pattern

An open community-driven process

  • is hard
  • is slow

 

But, it also

  • sets clear expectations
  • generates a shared language
  • produces a strong culture
  • produces a better 'product'
  • is worth it

Discovering Design Patterns

Thank you!

 

 

iRODS Consortium      @iRODS

RENCI                              @RENCI

Booth #437

 

 

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium