iRODS Meet and Greet (and Eat)
RENCI
January 20, 2016
First of All...
Why are we here?
Why should I care?
• Curiosity
• Collegiality
• Collaboration
Agenda
• iRODS Consortium Overview
• Meet the Team
• Where We're Going
• Discussion
Overview
Policy-Based Data Management
The Integrated Rule-Oriented Data System:
• Developed for working with massive collections of files
• Finding, securing, organizing, analyzing, preserving, and sharing data
• Example applications:
• Virtual collections - alternate presentations of stored data sets
• Federated access to data stored in remote systems
• Rich metadata for discoverability, access control, and integrity checking
• Programmable distribution of data to file systems and object stores
• Combining data from multiple processes
Data
Virtualization
Data
Discovery
Workflow
Automation
Secure
Collaboration
iRODS Clients
• Web-based and Standalone GUIs
- iRODS Cloud Browser, MetaLnx, Kanki, Cyberduck
• Portals, External Systems
- iPlant Discovery Environment, Islandora, Fedora Commons
• WebDAV for drag-and-drop access built in to the OS
• APIs: Python, REST, Qt, Java, C++
• Command Line Interface
iRODS is free, open source software owned by a foundation called the iRODS Consortium.
Goal is to sustain iRODS as free open source software by:
▹ Building good software. ▹ Growing the iRODS community. ▹ Demonstrating value.
Funds a team of 10+ developers, application engineers, documentation, support staff
The iRODS Consortium and Sustainability
Contract Customers
and more ...
Initial Trial
Proof of Concept to Pilot
Production
Building Community and Demonstrating Value
https://irods.org/documentation/
Getting Plugged In with iRODS...
Meet the Team
Meet the Team
Where We're Going
Other Goals
• Membership: retention, growth, verticals
• User Group Meeting 2016
• User-generated reference designs
• iRODS Partners
• iRODS Hub
• Certification
Discussion
Thank You!
Use Cases
User Profile: NASA Atmospheric Science Data Center
• 2 PB of archived satellite data
• Publicly available, subsetting on demand
• In-house ingest and archiving software: ANGe (Archive Next Generation)
User Profile: NASA Atmospheric Science Data Center
Federation
Virtual Collections
ls –l
/CER_100100.2012053100
/CER_100100.2012053100.met
/CER_100100.2012053101
/CER_100100.2012053101.met
/CER_100100.2012053102
/CER_100100.2012053102.met
Visibility determined by "visibility attribute"
Logical collection of files spread across physical storage resources.
Single Interface to Multiple Clients
WebDAV, FUSE, Web UI, Cyberduck
REST, Python, R, Java C++
(And more!)
User Profile: Wellcome Trust Sanger Institute
• Key genomics research centre
• 7 PB of storage managed by iRODS
Rich Metadata
attribute: library
attribute: total_reads
attribute: type
attribute: lane
attribute: is_paired_read
attribute: study_accession_number
attribute: library_id
attribute: sample_accession_number
attribute: sample_public_name
attribute: manual_qc
attribute: tag
attribute: sample_common_name
attribute: md5
Replication and Federation
User Profile: University College London
• Repository for research data that spans social science, physics, and genomics
• UK sponsored research requirements: last date of access request plus 10 years
• iRODS spans storage technologies and enables federated access from other centres
User Profile:
National Institute of Environmental Health Sciences
• Viral Vector Core creates designer viruses:
request⟶transfection and amplification⟶sample delivery⟶reports
• Uses iRODS to combine, organize, and analyze sets of requests and instrument results
• Produces packaged results in response to researcher requests
• Quarterly cost reports for chargeback and trend analysis for quality control