January 27-29, 2020
CS3 2020
Copenhagen, Denmark
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
Beyond Discoverability:
Metadata to drive your
data management
Beyond Discoverability:
Metadata to drive your
data management
Our Membership
Consortium
Member
Consortium
Member
Discoverability
As commercial, governmental, and research organizations continue to move from manual pipelines to automated processing of their vast and growing datasets, they are struggling to find meaning in their repositories.
Many products and approaches now provide data discoverability through indexing and aggregate counts, but few also provide the level of confidence needed for making strong assertions about data provenance.
For that, a system needs policy to be enforced; a model for data governance that provides understanding about what is in the system and how it came to be.
Metadata
Creation and curation can be either:
-
manual - by humans
- richness in meaning
- slow
- inconsistent
- error prone
-
automatic - by machines
- derived - from the system
- extracted - from within
- harvested - from elsewhere
Metadata
Can be of one of three types:
- descriptive - about the content, author, etc.
- structural - about the format, layout, implementation details
- administrative - about the management, processing of the data
We're primarily interested today in administrative metadata.
Metadata
- Structural helps to capture Descriptive
- Administrative drives the policy
- Leads to understanding and confidence
- Leads to meaning and science
Metadata Everywhere
Metadata Driven
With an open, policy-based platform, metadata can be elevated beyond assisting in just search and discoverability. Metadata can associate datasets, help build cohorts for analysis, coordinate data movement and scheduling, and drive the very policy that provides the data governance.
Data management should be data-centric and metadata driven.
Automation
ONLY with the automation of policy can your system provide the types of guarantees that you are actually interested in
- integrity
- provenance
- quality metadata enforcement
- reproducibility
Leaving the humans in charge of policy enforcement is a mistake.
- People should craft the policy together.
- Machines should enforce the defined policy.
provides the building blocks and the policy automation...
iRODS Core Competencies
( Unified Namespace )
( Metadata )
( Rule Engine )
( Federation )
iRODS Policies
- Replication
- Versioning
- Fixity Checking
- Retention
- Verification
- Logical Quotas
- Hard Links
- Metadata Access Control
- Metadata Template Enforcement
- ...
- Packaged and supported solutions
- Require configuration not code
- Derived from the majority of use cases observed in the user community
iRODS Capabilities
Text
Text
Text
Text
Text
Text
iRODS Patterns
- Data to Compute
- Compute to Data
- Synchronization
- Data Transfer Nodes
Text
Text
Text
Text
Data management
should be
data-centric and metadata driven.
Future-proof automated data management
requires
open formats and open source.
Thank you. Join us in June! https://irods.org/ugm
CS3 2020 - iRODS: Beyond Discoverability
By iRODS Consortium
CS3 2020 - iRODS: Beyond Discoverability
Metadata to drive your data management
- 1,145