January 27-29, 2020

CS3 2020

Copenhagen, Denmark

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

Beyond Discoverability:

Metadata to drive your

data management

Beyond Discoverability:

Metadata to drive your

data management

Our Membership

Consortium

Member

Consortium

Member

Discoverability

As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories.

 

Many products and approaches now provide data discoverability through indexing and aggregate counts, but few also provide the level of confidence needed for making strong assertions about data provenance.

 

For that, a system needs policy to be enforced; a model for data governance that provides understanding about what is in the system and how it came to be.

Metadata

Creation and curation can be either:

  • manual - by humans
    • richness in meaning
    • slow
    • inconsistent
    • error prone
  • automatic - by machines
    • derived - from the system
    • extracted - from within
    • harvested - from elsewhere

Metadata

Can be of one of three types:

  • descriptive - about the content, author, etc.
  • structural - about the format, layout, implementation details
  • administrative - about the management, processing of the data

 

We're primarily interested today in administrative metadata.

Metadata

  • Structural helps to capture Descriptive

 

  • Administrative drives the policy

 

  • Leads to understanding and confidence

 

  • Leads to meaning and science

Metadata Everywhere

Metadata Driven

With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance.

 

Data management should be data-centric and metadata driven.

Automation

ONLY with the automation of policy can your system provide the types of guarantees that you are actually interested in

  • integrity
  • provenance
  • quality metadata enforcement
  • reproducibility

 

Leaving the humans in charge of policy enforcement is a mistake.

 

  • People should craft the policy together.
  • Machines should enforce the defined policy.

provides the building blocks and the policy automation...

iRODS Core Competencies

( Unified Namespace )

( Metadata ) 

( Rule Engine ) 

( Federation )

iRODS Policies

  • Replication
  • Versioning
  • Fixity Checking
  • Retention
  • Verification
  • Logical Quotas
  • Hard Links
  • Metadata Access Control
  • Metadata Template Enforcement
  • ...

 

  • Packaged and supported solutions
  • Require configuration not code
  • Derived from the majority of use cases observed in the user community

iRODS Capabilities

Text

Text

Text

Text

Text

Text

iRODS Patterns

  • Data to Compute
  • Compute to Data
  • Synchronization
  • Data Transfer Nodes

 

Text

Text

Text

Text

Data management

should be

data-centric and metadata driven.

 

 

 

Future-proof automated data management

requires

open formats and open source.

      Thank you.                 Join us in June!              https://irods.org/ugm

CS3 2020 - iRODS: Beyond Discoverability

By iRODS Consortium

CS3 2020 - iRODS: Beyond Discoverability

Metadata to drive your data management

  • 1,145