Virtualizing Storage with Ceph and iRODS
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
Virtualizing Storage with Ceph and iRODS
July 2, 2019
Ceph Day
Utrecht, NL
What is iRODS
Distributed - runs on a laptop, a cluster, on premises or geographically distributed
Open Source - BSD-3 Licensed, install it today and try before you buy
Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure
iRODS as the Integration Layer
The Data Management Stack
Core Competencies
Policy
Capabilities
Patterns
Starting at the bottom :: Core Competencies
The underlying iRODS technology categorized into four areas
Data Virtualization
Combine various distributed storage technologies into a Unified Namespace
- Existing file systems
- Cloud storage
- On premises object storage
- Archival storage systems
iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.
Projection of the Physical into the Logical
Logical Path
Physical Path(s)
Data Discovery
Attach metadata to any first class entity within the iRODS Zone
- Data Objects
- Collections
- Users
- Storage Resources
- The Namespace
iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.
Metadata Everywhere
Workflow Automation
Integrated scripting language which is triggered by any operation within the framework
- Authentication
- Storage Access
- Database Interaction
- Network Activity
- Extensible RPC API
The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.
Dynamic Policy Enforcement
- restrict access
- log for audit and reporting
- provide additional context
- send a notification
The iRODS rule may:
Dynamic Policy Enforcement
A single API call expands to many plugin operations all of which may invoke policy enforcement
- Authentication
- Database
- Storage
- Network
- Rule Engine
- Microservice
- RPC API
Plugin Interfaces:
Secure Collaboration
iRODS allows for collaboration across administrative boundaries after deployment
- No need for common infrastructure
- No need for shared funding
- Affords temporary collaborations
iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.
iRODS as a Service Interface
Federation - Shared Data and Services
Possible Policies
- Data Movement
- Data Verification
- Data Retention
- Data Replication
- Data Placement
- Checksum Computation
- Metadata Extraction
- Metadata Application
- Metadata Conformance
iRODS Capabilities
Deployment Patterns
Data to Compute
Compute to Data
Filesystem Synchronization
Proposed Ceph Use Case - Sanger
HPC
Cluster
...
...
10G Ethernet
Bonded 50 or 100G Ethernet
...
Proposed Ceph Use Case - Sanger
- Ceph provides a single scalable performant storage back end
- Leverage erasure coding
- iRODS scales elastically in the OpenStack environment
- Both provide managed data access for HPC applications
- Ceph is virtualized through an iRODS storage plugin using the C API
Ceph Use Case - Maastricht
...
...
UNIFIED NAMESPACE
Managed replication for geographically distributed Ceph storage
Presented via S3 interface
Ceph Use Case - Maastricht
...
UNIFIED NAMESPACE
Remote
Archival Storage
at SURF
On Premisis
NAS
Leveraging Storage Tiering across virtualized storage resources
Ceph Use Case - Maastricht
- Open Hardware
- Ability to start small and grow
- Flexible license model
- S3 object presentation
- Manage replicas at distance across clusters
- Policy driven automated data movement
Our Business Model
Consortium Membership
- Participate in roadmap development
- Participate in consortium governance
- Direct support from the team
- Tier 3 support agreements
- Discount for support agreements
Questions?
Virtualizing Storage with Ceph and iRODS
By jason coposky
Virtualizing Storage with Ceph and iRODS
Ceph Day July 2, 2019
- 1,909