November 12-15, 2018

Supercomputing 2018

Dallas, TX

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

Managing Data

from the Edge to HPC

Managing Data

from the Edge to HPC

Data Management

"The development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets."

 

 

Most organizations are still managing their assets with a collection of small scripts, tribal knowledge, vigilance, and hope.

 


Organizations, instead, need a future-proof solution to managing data and its surrounding infrastructure.

Why Data Management Matters

As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.

Typical Data Flow

Devices / Sensors                                          On Premise / Cloud

Incoming source data from satellites, sequencers, microscopes, ... sheep

The Problem

Data is coming in with greater...

  • Volume
  • Velocity
  • Variety

 

Human-throttled ingestion and cleaning is no longer sufficient.

  • Should be handled with policy and procedure
  • Should be handled with code 
  • Should be handled closer to point of creation

 

 

Where is the Edge?

Devices / Sensors                                           On Premise / Cloud

Where does the data come under management?

Where can it be vouched for?

Where can it be trusted?

A Modest Proposal

iRODS is open source data management software

 

 

 

Provides insurance against your changing infrastructure:

  • edge devices
  • storage
  • compute
  • networking
  • authentication

Where is the Edge?

Devices / Sensors                                           On Premise / Cloud

Create a logical namespace

Where is the Edge?

Devices / Sensors          Edge                        On Premise / Cloud

Move the point of ingestion closer to the source.  Ingest on site.  Ingest at the point of data creation.

UNIFIED NAMESPACE

The Data Lifecycle begins at Data Generation

By bringing data management to the point of data generation

(and extending the programmatic surface out to the instruments),

a system with this architecture can address other hard problems:

  • Data Harmonization
  • Data Movement
  • Data Integrity
  • Geographic Distribution
  • Network Capacity
  • Network Reliability
  • Variety of Data Sources
  • Variety of Data Formats

Text

Thank you!

 

 

iRODS Consortium      @iRODS

RENCI                              @RENCI

Booth #2238

 

 

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

SC18 - Managing Data from the Edge to HPC

By iRODS Consortium

SC18 - Managing Data from the Edge to HPC

  • 1,572