Virtualizing Storage with Ceph and iRODS
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
Virtualizing Storage with Ceph and iRODS
July 2, 2019
Ceph Day
Utrecht, NL
What is iRODS
Distributed - runs on a laptop, a cluster, on premises or geographically distributed
Open Source - BSD-3 Licensed, install it today and try before you buy
Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure
iRODS as the Integration Layer
The Data Management Stack
Core Competencies
Policy
Capabilities
Patterns
Starting at the bottom :: Core Competencies
The underlying iRODS technology categorized into four areas
Data Virtualization
Combine various distributed storage technologies into a Unified Namespace
iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.
Projection of the Physical into the Logical
Logical Path
Physical Path(s)
Data Discovery
Attach metadata to any first class entity within the iRODS Zone
iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.
Metadata Everywhere
Workflow Automation
Integrated scripting language which is triggered by any operation within the framework
The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.
Dynamic Policy Enforcement
The iRODS rule may:
Dynamic Policy Enforcement
A single API call expands to many plugin operations all of which may invoke policy enforcement
Plugin Interfaces:
Secure Collaboration
iRODS allows for collaboration across administrative boundaries after deployment
iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.
iRODS as a Service Interface
Federation - Shared Data and Services
Possible Policies
iRODS Capabilities
Deployment Patterns
Data to Compute
Compute to Data
Filesystem Synchronization
Proposed Ceph Use Case - Sanger
HPC
Cluster
...
...
10G Ethernet
Bonded 50 or 100G Ethernet
...
Proposed Ceph Use Case - Sanger
Ceph Use Case - Maastricht
...
...
UNIFIED NAMESPACE
Managed replication for geographically distributed Ceph storage
Presented via S3 interface
Ceph Use Case - Maastricht
...
UNIFIED NAMESPACE
Remote
Archival Storage
at SURF
On Premisis
NAS
Leveraging Storage Tiering across virtualized storage resources
Ceph Use Case - Maastricht
Our Business Model
Consortium Membership
Questions?