iRODS Proof of Concept

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

iRODS Proof of Concept

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

March 15, 2018

What is iRODS

iRODS is

  • Distributed
  • Open source
  • Metadata Driven
  • Data Centric

 

A flexible framework for the abstraction of infrastructure

iRODS as the Integration Layer

Data Virtualization

Combine various distributed storage technologies into a Unified Namespace

  • Existing file systems
  • Cloud storage
  • On premises object storage
  • Archival storage systems

iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.

Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.

Workflow Automation

Plugin framework supporting many languages, triggered by any operation within the system

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

iRODS rule engines provide the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

Institutional repositories

As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.

Example

Use Case

iRODS

The Wellcome Trust Sanger Institute

Sanger - Replication

  • Data preferentially placed on resource servers in the green data center (fallback to red)
  • Data replicated to the other room.
  • Checksums applied
  • Green and red centers both used for read access.

Sanger - Metadata

attribute: library

attribute: total_reads

attribute: type

attribute: lane

attribute: is_paired_read

attribute: study_accession_number

attribute: library_id

attribute: sample_accession_number

attribute: sample_public_name

attribute: manual_qc

attribute: tag

attribute: sample_common_name

attribute: md5

attribute: tag_index

attribute: study_title

attribute: study_id

attribute: reference

attribute: sample

attribute: target

attribute: sample_id

attribute: id_run

attribute: study

attribute: alignment

  • Example metadata attributes
  • Users query and access data from local compute clusters
  • Users access iRODS locally via the command line interface

Sanger - Federation

Currently Deployment

Current

Proof of Concept

On Premises to Any Cloud Infrastructure

Current Infrastructure

  • PostgreSQL Database
  • iRODS Catalog Provider
  • File system Scanner

Single 4 core VM hosting:

iRODS is presenting multiple NFS volumes and S3 buckets

File system Scanning

Cloud Synchronization

  • Data in S3 buckets was registered in place
  • Data in NFS is scanned and registered in place
  • Data in NFS is considered the authoritative replica - S3 replica is marked Stale
  • Capture file system metadata
  • Capture file type metadata
  • If size or checksum mismatch
    • Log out of date replica
    • future work - automatically synchronize to S3

iRODS rule base - registration policy

Ingest Pipeline

iRODS Clients

Example

Interfaces

 

Cloud Browser - Home Collection

Cloud Browser - Search

Cloud Browser - Results

Cloud Browser - Metadata

Command Line

Unix like utilities which interact with the server

  • iput - ingest data
  • iget - extract data
  • ils - list logical collections
  • iquest - query catalog with sql-like language
  • imeta - set and query metadata
  • Many more specialized commands...

iRODS Proof of Concept

By jason coposky

iRODS Proof of Concept

  • 1,113