Executive Overview

and

Demonstration

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

Executive Overview

and

Demonstration

October 3, 2019

USDA

Remote Presentation

What is iRODS

Distributed - runs on a laptop, a cluster, on premises or geographically distributed

Open Source - BSD-3 Licensed, install it today and try before you buy

Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure

iRODS as the Integration Layer

iRODS Core Competencies

The underlying technology categorized into four areas

Data Virtualization

Combine various distributed storage technologies into a Unified Namespace

  • Existing file systems
  • Cloud storage
  • On premises object storage
  • Archival storage systems

iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.

Projection of the Physical into the Logical

Logical Path

Physical Path(s)

Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.

Metadata Everywhere

Workflow Automation

Integrated scripting language which is triggered by any operation within the framework

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.

Dynamic Policy Enforcement

  • restrict access
  • log for audit and reporting
  • provide additional context
  • send a notification

The iRODS rule may:

Dynamic Policy Enforcement

A single API call expands to many plugin operations all of which may invoke policy enforcement

  • Authentication
  • Database
  • Storage
  • Network
  • Rule Engine
  • Microservice
  • RPC API

Plugin Interfaces:

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

iRODS as a Service Interface

Federation - Shared Data and Services

Ingest to Institutional repository

As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.

iRODS Capabilities

Automated Ingest - Landing Zone

Automated Ingest - Filesystem Scanning

Storage Tiering

Deployment Patterns

Data to Compute

Compute to Data

Filesystem Synchronization

Filesystem Synchronization

Data to Compute

Compute to Data

The Data Management Model

Use Cases

iRODS

The Wellcome Sanger Institute

Sanger - Replication

  • Data preferentially placed on resource servers in the green data center (fallback to red)
  • Data replicated to the other room.
  • Checksums applied
  • Green and red centers both used for read access.

Sanger - Metadata

attribute: library

attribute: total_reads

attribute: type

attribute: lane

attribute: is_paired_read

attribute: study_accession_number

attribute: library_id

attribute: sample_accession_number

attribute: sample_public_name

attribute: manual_qc

attribute: tag

attribute: sample_common_name

attribute: md5

attribute: tag_index

attribute: study_title

attribute: study_id

attribute: reference

attribute: sample

attribute: target

attribute: sample_id

attribute: id_run

attribute: study

attribute: alignment

  • Example metadata attributes
  • Users query and access data from local compute clusters
  • Users access iRODS locally via the command line interface

Sanger - Federation

Maastricht DataHub

Maastricht DataHub

SURF Scale Out Pilot

University Zone

Catalog

University Zone

Catalog

Server Hosting Environment

Resource Server

Resource Server

Tape Archive

Disk Storage

Object Storage

SURF EUDAT CDI

External Community Zones

Catalog

Zone

Catalog

Local Storage

CXFS

Tape Library

EUDAT  University Zone

Catalog

EUDAT University Zone

Catalog

B2SAFE iRODS Federation

EUDAT Centers

iRODS Federation

ARCHIVE

GridFTP Data Movement

Overview

iRODS Demo

The Infrastructure

Implemented with docker-compose

8 Containers:

  • Automated Ingest
  • NFSRods
  • Metalnx
  • Audit ElasticStack
  • iRODS Client
  • Metalnx Database
  • DAVrods
  • iRODS Catalog Service Provider

The Infrastructure

Catalog Service Provider

Catalog Service

Provider

Automated Ingest

Service

iRODS

Protocol

AMQP iRODS

Event Stream

NFSRods

iRODS

Protocol

DAVRods

iRODS

Protocol

Metalnx

ElasticStack

iRODS

Protocol

The Content

  • Ingest policy to extract metadata then move data to a Long Term Storage resource
  • Apply metadata to the object in the catalog
    • metadata headers available in the files
    • contextual metadata : LZ directory, instrument, etc.
  • Implement basic encryption for the data on the LTS
  • Demonstrate
    • ingest
    • discovery
    • encryption at rest
    • data egress
    • graphical presentation
    • file system presentation : NFS and WebDAV

Automated Ingest

Two directories were created:

  • /tmp/landing_zone
  • /tmp/ingested

Any data that arrives in /tmp/landing_zone will:

  • Automatically moved to a storage resource
  • Registered into the catalog at a configured location
  • Metadata extracted and applied to the object in the catalog
  • Remaining file moved to /tmp/ingested

Users can view and access data and metadata from any client

Encryption at Rest

Policy has been configured to encrypt and decrypt data in-place when accessed by the object interface

Using the iRODS command line, data can be ingested and then inspected at rest

The data may then be retrieved using the command line

Note: access via the POSIX interface (DAVRods, NFSRods, Metalnx) will continue to be encrypted

Data Discovery with Metalnx

Automated Ingest has provided metadata for data discovery

 

The metadata can be directly inspected in Metalnx

 

The query builder can be used to identify data sets of interest via Attribute, Value, Unit matches

 

Queries to the system metadata may also be performed, searching on values such as file name, collection path, user, etc.

File System Presentations: DAVRods

DAVRods provides both a simple web based interface as well as the ability to mount a folder on the desktop

 

DAVRods is an Apache Module implemented in C using the native iRODS POSIX API

 

DAVRods can be used to edit data in-place, or to copy data to/from a users collections

File System Presentation: NFSRods

NFSRods is an NFSv4 implementation base d on NFS4J using the iRODS Java API

 

NFSRods also leverages the iRODS POSIX API

 

NFSRods provides full command line capabilities to the shell for working with data in place at the console

Questions?

Made with Slides.com