Terrell Russell, Ph.D.
@terrellrussell
Executive Director, iRODS Consortium
iRODS and eHealth
June 5-9, 2023
TNC23
Tirana, Albania
Our Membership
Consortium
Member
Consortium
Member
Consortium
Member
Today's Talk
What is iRODS
Open Source
Distributed
Data Centric & Metadata Driven
Philosophical Drivers
100-year view
Plugin Architecture
core is generic - protocol, api, bookkeeping
plugins are specific
policy composition
Modern core libraries
standardized interfaces
refactored iRODS internals
ease of (re)use
fewer bugs
Why use iRODS?
People need a solution for:
The larger the organization, the more they need software like iRODS.
iRODS as the Integration Layer
iRODS Core Competencies
iRODS Capabilities
The Data Management Model
Ingest to Institutional Repository
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
Today's Talk
The Wellcome Sanger Institute
Sanger - Replication
Sanger - Metadata
attribute: library
attribute: total_reads
attribute: type
attribute: lane
attribute: is_paired_read
attribute: study_accession_number
attribute: library_id
attribute: sample_accession_number
attribute: sample_public_name
attribute: manual_qc
attribute: tag
attribute: sample_common_name
attribute: md5
attribute: tag_index
attribute: study_title
attribute: study_id
attribute: reference
attribute: sample
attribute: target
attribute: sample_id
attribute: id_run
attribute: study
attribute: alignment
Sanger - Federation
Maastricht DataHub
Maastricht DataHub
Berlin Institute of Health (BIH)
Berlin Institute of Health (BIH)
GA4GH Integration
GA4GH Data Repository Service (DRS) for iRODS
https://github.com/michael-conway/irods-ga4gh-dos
The GA4GH Data Repository Service (DRS) standard is part of a family of standards for distributed, federated data analysis. Using standard workflow languages such as WDL, CWL, and Nextflow, these standards allow workflows to dispatch containerized tasks to run at appropriate locations, including across cloud providers and on-prem compute environments. The DRS standard provides an abstraction over distributed data sources, allowing these workflow tasks to authorize data access and access underlying data sets.
A DRS implementation over iRODS allows the iRODS data grid to expose data to this federated analysis ecosystem. The Federated Analysis System Project (FASP) components represent a formalization of the iRODS 'compute to data' pattern for the important Genomics and Health community.
Today's Talk
Engineering Tradeoffs
Building these systems is always a series of decisions made in an environment with multiple constraints.
A flexible solution is necessary.
Protocol Plumbing
Imaging Working Group
Goal: To provide a standardized suite of imaging policies and practices for integration with existing tools and pipelines
Big Picture
iRODS is a flexible platform for building eHealth solutions
Big Picture
Proper data management requires policy enforcement.
These policies will change over time.
Open source is the best practice for a 100-year view.
Join Us Next Week - iRODS UGM2023