January 30, 2019
Cloud Synchronization and Sharing Serivces
Rome, Italy
Jason Coposky
@jason_coposky
Executive Director, iRODS Consortium
Managing Data
from the Edge to HPC
Managing Data
from the Edge to HPC
Data Management
"The development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets."
Most organizations are still managing their assets with a collection of small scripts, tribal knowledge, vigilance, and hope.
Organizations, instead, need a future-proof solution to managing data and its surrounding infrastructure.
Why Data Management Matters
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
Typical Data Flow
Devices / Sensors On Premise / Cloud
Incoming source data from satellites, sequencers, microscopes, ... sheep
The Problem
Data is coming in with greater...
Human-throttled ingestion and cleaning is no longer sufficient.
Where is the Edge?
Devices / Sensors On Premise / Cloud
Where does the data come under management?
Where can it be vouched for?
Where can it be trusted?
A Modest Proposal
iRODS is open source data management software
Provides insurance against your changing infrastructure:
iRODS Core Competencies
The underlying technology categorized into four areas
iRODS Policy Examples
iRODS Capabilities
Deployment Patterns
Data to Compute
Compute to Data
Filesystem Synchronization
The Data Management Model
Where is the Edge?
Devices / Sensors On Premise / Cloud
Create a logical namespace
Where is the Edge?
Devices / Sensors Edge On Premise / Cloud
Move the point of ingestion closer to the source. Ingest on site. Ingest at the point of data creation.
UNIFIED NAMESPACE
The Data Lifecycle begins at Data Generation
By bringing data management to the point of data generation
(and extending the programmatic surface out to the instruments),
a system with this architecture can address other hard problems:
Automated Ingest - Landing Zone
Automated Ingest - Filesystem Scanning
Data to Compute
Compute to Data
Resources
iRODS Overview and Diagrams
https://irods.org/documentation
Official Documentation
https://docs.irods.org
iRODS Training Materials and Presentations
https://slides.com/irods
iRODS User Group
https://irods.org/ugm2019
Questions?