November 12-15, 2018
Supercomputing 2018
Dallas, TX
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
Managing Data
from the Edge to HPC
Managing Data
from the Edge to HPC
Data Management
"The development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets."
Most organizations are still managing their assets with a collection of small scripts, tribal knowledge, vigilance, and hope.
Organizations, instead, need a future-proof solution to managing data and its surrounding infrastructure.
Why Data Management Matters
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
Typical Data Flow
Devices / Sensors On Premise / Cloud
Incoming source data from satellites, sequencers, microscopes, ... sheep
The Problem
Data is coming in with greater...
- Volume
- Velocity
- Variety
Human-throttled ingestion and cleaning is no longer sufficient.
- Should be handled with policy and procedure
- Should be handled with code
- Should be handled closer to point of creation
Where is the Edge?
Devices / Sensors On Premise / Cloud
Where does the data come under management?
Where can it be vouched for?
Where can it be trusted?
A Modest Proposal
iRODS is open source data management software
Provides insurance against your changing infrastructure:
- edge devices
- storage
- compute
- networking
- authentication
Where is the Edge?
Devices / Sensors On Premise / Cloud
Create a logical namespace
Where is the Edge?
Devices / Sensors Edge On Premise / Cloud
Move the point of ingestion closer to the source. Ingest on site. Ingest at the point of data creation.
UNIFIED NAMESPACE
The Data Lifecycle begins at Data Generation
By bringing data management to the point of data generation
(and extending the programmatic surface out to the instruments),
a system with this architecture can address other hard problems:
- Data Harmonization
- Data Movement
- Data Integrity
- Geographic Distribution
- Network Capacity
- Network Reliability
- Variety of Data Sources
- Variety of Data Formats
Text
Thank you!
iRODS Consortium @iRODS
RENCI @RENCI
Booth #2238
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
SC18 - Managing Data from the Edge to HPC
By iRODS Consortium
SC18 - Managing Data from the Edge to HPC
- 1,572