Virtualizing Storage with Ceph and iRODS

Jason M. Coposky


Executive Director, iRODS Consortium

Virtualizing Storage with Ceph and iRODS

July 2, 2019

Ceph Day

Utrecht, NL

What is iRODS

Distributed - runs on a laptop, a cluster, on premises or geographically distributed

Open Source - BSD-3 Licensed, install it today and try before you buy

Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure

iRODS as the Integration Layer

The Data Management Stack

Core Competencies




Starting at the bottom :: Core Competencies

The underlying iRODS technology categorized into four areas

Data Virtualization

Combine various distributed storage technologies into a Unified Namespace

  • Existing file systems
  • Cloud storage
  • On premises object storage
  • Archival storage systems

iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.

Projection of the Physical into the Logical

Logical Path

Physical Path(s)

Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.

Metadata Everywhere

Workflow Automation

Integrated scripting language which is triggered by any operation within the framework

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.

Dynamic Policy Enforcement

  • restrict access
  • log for audit and reporting
  • provide additional context
  • send a notification

The iRODS rule may:

Dynamic Policy Enforcement

A single API call expands to many plugin operations all of which may invoke policy enforcement

  • Authentication
  • Database
  • Storage
  • Network
  • Rule Engine
  • Microservice

Plugin Interfaces:

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

iRODS as a Service Interface

Federation - Shared Data and Services

Possible Policies

  • Data Movement
  • Data Verification
  • Data Retention
  • Data Replication
  • Data Placement
  • Checksum Computation
  • Metadata Extraction
  • Metadata Application
  • Metadata Conformance

iRODS Capabilities

Deployment Patterns

Data to Compute

Compute to Data

Filesystem Synchronization

Proposed Ceph Use Case - Sanger





10G Ethernet

Bonded 50 or 100G Ethernet


Proposed Ceph Use Case - Sanger

  • Ceph provides a single scalable performant storage back end
  • Leverage erasure coding
  • iRODS scales elastically in the OpenStack environment
  • Both provide managed data access for HPC applications
  • Ceph is virtualized through an iRODS storage plugin using the C API

Ceph Use Case - Maastricht




Managed replication for geographically distributed Ceph storage

Presented via S3 interface

Ceph Use Case - Maastricht




Archival Storage


On Premisis


Leveraging Storage Tiering across virtualized storage resources

Ceph Use Case - Maastricht

  • Open Hardware
  • Ability to start small and grow
  • Flexible license model
  • S3 object presentation
  • Manage replicas at distance across clusters
  • Policy driven automated data movement

Our Business Model

Consortium Membership

  • Participate in roadmap development
  • Participate in consortium governance
  • Direct support from the team
  • Tier 3 support agreements
  • Discount for support agreements


Virtualizing Storage with Ceph and iRODS

By jason coposky

Virtualizing Storage with Ceph and iRODS

Ceph Day July 2, 2019

  • 1,366