iRODS Overview
June 2016
Agenda
• Introduction
• What is iRODS?
• Case Studies
• Discussion
Introduction
iRODS is free, open source software owned by a foundation called the iRODS Consortium.
Goal is to sustain iRODS as free open source software by:
▹ Building good software. ▹ Growing the iRODS community. ▹ Demonstrating value.
Funds a team of 10+ developers, application engineers, documentation, support staff
The iRODS Consortium and Sustainability
Contract Customers
and more ...
What is iRODS?
Policy-Based Data Management
The Integrated Rule-Oriented Data System:
• Open Source
• Manages data for indefinite lifespan
• Metadata and event-triggered rules
Designed to work with big, important, and/or complex data
Example applications:
• Combining data from multiple sources
• Annotation and search
• Tiered storage
• Preservation
• Sharing in place
Data
Virtualization
Data
Discovery
Workflow
Automation
Secure
Collaboration
Data Virtualization
Data Discovery
Workflow Automation
Secure Collaboration
The Four Pillars
← all your storage in a single namespace
← system and user-generated metadata
← event-driven and scheduled cron → policies
← federation
Data Virtualization
Data Virtualization Example
The following configuration uses a replication resource to replicate all files to Resource1 and Resource2.
$ ilsresc
BaseResource:replication
├── Resource1
└── Resource2
After putting a file into iRODS the file exists on both resources.
$ ils /tempZone/home/rods: temp.txt $ ils -L temp.txt rods 0 BaseResource;Resource2 20 2016-06-15.10:31 & temp.txt generic /var/lib/irods/Vault/home/rods/temp.txt rods 1 BaseResource;Resource1 20 2016-06-15.10:31 & temp.txt generic /var/lib/irods/Vault/home/rods/temp.txt
Data Discovery
Metadata may be attached to data objects, users, groups, collections and resources
Users may search for objects that contain specific metadata.
The following is an example of using the command line tool "imeta" to find all files that belong to the study "WoundVac"
$ imeta qu -d belongsToStudy = WoundVac
collection: /sc2i_zone/SC2i/WoundVac
dataObj: SC2i CDR UAT_Test File 3.xlsx
----
collection: /sc2i_zone/SC2i/WoundVac/Analysis
dataObj: 20160325_studies.csv
Workflow Automation
The iRODS rule engine allows customization of iRODS processing.
The rule engine is triggered when certain actions (called policy enforcement points) are encountered. Example policy enforcement points:
Data object is placed into iRODS
Metadata is added to a data object.
User is created
Etc.
Administrators can write rules to do just about anything desired from automatically populating metadata based on file contents to controlling access to certain objects.
Workflow Automation in SC2i
In the SC2i project we have rules to perform the following tasks:
Secure Collaboration
iRODS Clients
• Web-based and Standalone GUIs
- iRODS Cloud Browser, MetaLnx, Kanki, Cyberduck
• Portals, External Systems
- iPlant Discovery Environment, Islandora, Fedora Commons
• WebDAV for drag-and-drop access built in to the OS
• APIs: Python, REST, Qt, Java, C, R
• Command Line Interface
iRODS Cloud Browser
Case Studies
User Profile: NASA Atmospheric Science Data Center
• 2 PB of archived satellite data
• Publicly available, subsetting on demand
• In-house ingest and archiving software: ANGe (Archive Next Generation)
Virtual Collections
ls –l
/CER_100100.2012053100
/CER_100100.2012053100.met
/CER_100100.2012053101
/CER_100100.2012053101.met
/CER_100100.2012053102
/CER_100100.2012053102.met
Visibility determined by "visibility attribute"
Logical collection of files spread across physical storage resources.
Single Interface to Multiple Clients
WebDAV, FUSE, Web UI, Cyberduck
REST, Python, R, Java C++
(And more!)
Federation
User Profile:
National Institute of Environmental Health Sciences
• Viral Vector Core creates designer viruses:
request⟶transfection and amplification⟶sample delivery⟶reports
• Uses iRODS to combine, organize, and analyze sets of requests and instrument results
• Produces packaged results in response to researcher requests
• Quarterly cost reports for chargeback and trend analysis for quality control
User Profile: University College London
• Repository for research data that spans social science, physics, and genomics
• UK sponsored research requirements: last date of access request plus 10 years
• iRODS spans storage technologies and enables federated access from other centres
User Profile: Wellcome Trust Sanger Institute
• Key genomics research centre
• 7 PB of storage managed by iRODS
Rich Metadata
attribute: library
attribute: total_reads
attribute: type
attribute: lane
attribute: is_paired_read
attribute: study_accession_number
attribute: library_id
attribute: sample_accession_number
attribute: sample_public_name
attribute: manual_qc
attribute: tag
attribute: sample_common_name
attribute: md5
Replication and Federation
User Profile: CyVerse
User Profile: Antelope RTS Sensor Integration
Discussion
Thank You!