iRODS Overview

 

June 2016

Agenda

• Introduction

 

• What is iRODS?

 

• Case Studies

 

• Discussion

 

 

Introduction

Justin James

Application Engineer, iRODS Consortium

jjames@renci.org

+1-919-445-9652

 

iRODS is free, open source software owned by a foundation called the iRODS Consortium.

  • Goal is to sustain iRODS as free open source software by:

    ▹ Building good software.  ▹ Growing the iRODS community.  ▹ Demonstrating value.

 

  • Funds a team of 10+ developers, application engineers, documentation, support staff

The iRODS Consortium and Sustainability

Contract Customers

and more ...

What is iRODS?

          Policy-Based Data Management

The Integrated Rule-Oriented Data System:

• Open Source

• Manages data for indefinite lifespan

• Metadata and event-triggered rules

 

Designed to work with big, important, and/or complex data

 

Example applications:

   • Combining data from multiple sources

   • Annotation and search

   • Tiered storage

   • Preservation

   • Sharing in place

 

 

 

Data

Virtualization

Data

Discovery

Workflow

Automation

Secure

Collaboration

  Data Virtualization

 

 

 

  Data Discovery

 

 

 

  Workflow Automation

 

 

 

  Secure Collaboration

          The Four Pillars

← all your storage in a single namespace

← system and user-generated metadata

← event-driven and scheduled cron  policies

 federation

Data Virtualization

  • All data objects (files) are accessed through a common namespace.
     
  • Physical data location is (mostly) transparent to the iRODS user.
     
  • Data is stored on "resources".
     
  • Complex resource hierarchies can be built to automatically replicate data among multiple resources, send large files to different resources than small files, etc.
     
  • iRODS supports multiple storage types including UNIX file systems, DDN WOS, and Amazon S3.

Data Virtualization Example

The following configuration uses a replication resource to replicate all files to Resource1 and Resource2.

$ ilsresc
BaseResource:replication
├── Resource1
└── Resource2

After putting a file into iRODS the file exists on both resources.

$ ils
/tempZone/home/rods:
  temp.txt
$ ils -L temp.txt
  rods              0 BaseResource;Resource2      20 2016-06-15.10:31 & temp.txt
        generic    /var/lib/irods/Vault/home/rods/temp.txt
  rods              1 BaseResource;Resource1           20 2016-06-15.10:31 & temp.txt
        generic    /var/lib/irods/Vault/home/rods/temp.txt

Data Discovery

  • Metadata may be attached to data objects, users, groups, collections and resources
     

  • Users may search for objects that contain specific metadata.
     

  • The following is an example of using the command line tool "imeta" to find all files that belong to the study "WoundVac"

$ imeta qu -d belongsToStudy = WoundVac
collection: /sc2i_zone/SC2i/WoundVac
dataObj: SC2i CDR UAT_Test File 3.xlsx
----
collection: /sc2i_zone/SC2i/WoundVac/Analysis
dataObj: 20160325_studies.csv

Workflow Automation

  • The iRODS rule engine allows customization of iRODS processing.

  • The rule engine is triggered when certain actions (called policy enforcement points) are encountered.  Example policy enforcement points:

    • ​Data object is placed into iRODS

    • Metadata is added to a data object.

    • User is created

    • Etc.

  • Administrators can write rules to do just about anything desired from automatically populating metadata based on file contents to controlling access to certain objects.

Workflow Automation in SC2i

In the SC2i project we have rules to perform the following tasks:
 

  • Write audit log entries when specific events are encountered.
     
  • Automatically populate the belongsToStudy and hasDataType metadata for all files ingested into iRODS.
     
  • Automatically store backups of all files when new version of the files are ingested into iRODS.

Secure Collaboration

  • Instances of iRODS are called iRODS zones.
     
  • iRODS provides the ability to federate with other iRODS zones.
     
  • With federation enabled, users may login to their home zone and access data from remote zones.

iRODS Clients

• Web-based and Standalone GUIs

  - iRODS Cloud Browser, MetaLnx, Kanki, Cyberduck

 

• Portals, External Systems

  - iPlant Discovery Environment, Islandora, Fedora Commons

 

• WebDAV for drag-and-drop access built in to the OS

• APIs: Python, REST, Qt, Java, C, R

• Command Line Interface

iRODS Cloud Browser

Case Studies

User Profile: NASA Atmospheric Science Data Center

• 2 PB of archived satellite data

• Publicly available, subsetting on demand

• In-house ingest and archiving software: ANGe (Archive Next Generation)

Virtual Collections

 

ls –l

/CER_100100.2012053100
/CER_100100.2012053100.met
/CER_100100.2012053101
/CER_100100.2012053101.met
/CER_100100.2012053102
/CER_100100.2012053102.met

Visibility determined by "visibility attribute"

Logical collection of files spread across physical storage resources.

Single Interface to Multiple Clients

 

WebDAV, FUSE, Web UI, Cyberduck

REST, Python, R, Java C++

(And more!)

Federation

 

 

 

 

 

 

 

 

User Profile:

National Institute of Environmental Health Sciences

• Viral Vector Core creates designer viruses:

    request⟶transfection and amplification⟶sample delivery⟶reports

 

• Uses iRODS to combine, organize, and analyze sets of requests and instrument results

   • Produces packaged results in response to researcher requests

   • Quarterly cost reports for chargeback and trend analysis for quality control

User Profile: University College London

• Repository for research data that spans social science, physics, and genomics

• UK sponsored research requirements: last date of access request plus 10 years

• iRODS spans storage technologies and enables federated access from other centres

User Profile: Wellcome Trust Sanger Institute

• Key genomics research centre

• 7 PB of storage managed by iRODS

Rich Metadata

 

attribute: library

attribute: total_reads

attribute: type

attribute: lane

attribute: is_paired_read

attribute: study_accession_number

attribute: library_id

attribute: sample_accession_number

attribute: sample_public_name

attribute: manual_qc

attribute: tag

attribute: sample_common_name

attribute: md5

 

Replication and Federation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

User Profile: CyVerse

User Profile: Antelope RTS Sensor Integration

Discussion

Thank You!

Dan Bedard

Director, iRODS Consortium

danb@renci.org

+1-919-445-0632

 

Made with Slides.com