Executive Overview
and
Demonstration
Jason Coposky
@jason_coposky
Executive Director, iRODS Consortium
Executive Overview
and
Demonstration
October 3, 2019
USDA
Remote Presentation
What is iRODS
Distributed - runs on a laptop, a cluster, on premises or geographically distributed
Open Source - BSD-3 Licensed, install it today and try before you buy
Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure
iRODS as the Integration Layer
iRODS Core Competencies
The underlying technology categorized into four areas
Data Virtualization
Combine various distributed storage technologies into a Unified Namespace
- Existing file systems
- Cloud storage
- On premises object storage
- Archival storage systems
iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.
Projection of the Physical into the Logical
Logical Path
Physical Path(s)
Data Discovery
Attach metadata to any first class entity within the iRODS Zone
- Data Objects
- Collections
- Users
- Storage Resources
- The Namespace
iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.
Metadata Everywhere
Workflow Automation
Integrated scripting language which is triggered by any operation within the framework
- Authentication
- Storage Access
- Database Interaction
- Network Activity
- Extensible RPC API
The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.
Dynamic Policy Enforcement
- restrict access
- log for audit and reporting
- provide additional context
- send a notification
The iRODS rule may:
Dynamic Policy Enforcement
A single API call expands to many plugin operations all of which may invoke policy enforcement
- Authentication
- Database
- Storage
- Network
- Rule Engine
- Microservice
- RPC API
Plugin Interfaces:
Secure Collaboration
iRODS allows for collaboration across administrative boundaries after deployment
- No need for common infrastructure
- No need for shared funding
- Affords temporary collaborations
iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.
iRODS as a Service Interface
Federation - Shared Data and Services
Ingest to Institutional repository
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
iRODS Capabilities
Automated Ingest - Landing Zone
Automated Ingest - Filesystem Scanning
Storage Tiering
Deployment Patterns
Data to Compute
Compute to Data
Filesystem Synchronization
Filesystem Synchronization
Data to Compute
Compute to Data
The Data Management Model
Use Cases
iRODS
The Wellcome Sanger Institute
Sanger - Replication
- Data preferentially placed on resource servers in the green data center (fallback to red)
- Data replicated to the other room.
- Checksums applied
- Green and red centers both used for read access.
Sanger - Metadata
attribute: library
attribute: total_reads
attribute: type
attribute: lane
attribute: is_paired_read
attribute: study_accession_number
attribute: library_id
attribute: sample_accession_number
attribute: sample_public_name
attribute: manual_qc
attribute: tag
attribute: sample_common_name
attribute: md5
attribute: tag_index
attribute: study_title
attribute: study_id
attribute: reference
attribute: sample
attribute: target
attribute: sample_id
attribute: id_run
attribute: study
attribute: alignment
- Example metadata attributes
- Users query and access data from local compute clusters
- Users access iRODS locally via the command line interface
Sanger - Federation
Maastricht DataHub
Maastricht DataHub
SURF Scale Out Pilot
University Zone
Catalog
University Zone
Catalog
Server Hosting Environment
Resource Server
Resource Server
Tape Archive
Disk Storage
Object Storage
SURF EUDAT CDI
External Community Zones
Catalog
Zone
Catalog
Local Storage
CXFS
Tape Library
EUDAT University Zone
Catalog
EUDAT University Zone
Catalog
B2SAFE iRODS Federation
EUDAT Centers
iRODS Federation
ARCHIVE
GridFTP Data Movement
Overview
iRODS Demo
The Infrastructure
Implemented with docker-compose
8 Containers:
- Automated Ingest
- NFSRods
- Metalnx
- Audit ElasticStack
- iRODS Client
- Metalnx Database
- DAVrods
- iRODS Catalog Service Provider
The Infrastructure
Catalog Service
Provider
Automated Ingest
Service
iRODS
Protocol
AMQP iRODS
Event Stream
NFSRods
iRODS
Protocol
DAVRods
iRODS
Protocol
Metalnx
ElasticStack
iRODS
Protocol
The Content
- Ingest policy to extract metadata then move data to a Long Term Storage resource
-
Apply metadata to the object in the catalog
- metadata headers available in the files
- contextual metadata : LZ directory, instrument, etc.
- Implement basic encryption for the data on the LTS
-
Demonstrate
- ingest
- discovery
- encryption at rest
- data egress
- graphical presentation
- file system presentation : NFS and WebDAV
Automated Ingest
Two directories were created:
- /tmp/landing_zone
- /tmp/ingested
Any data that arrives in /tmp/landing_zone will:
- Automatically moved to a storage resource
- Registered into the catalog at a configured location
- Metadata extracted and applied to the object in the catalog
- Remaining file moved to /tmp/ingested
Users can view and access data and metadata from any client
Encryption at Rest
Policy has been configured to encrypt and decrypt data in-place when accessed by the object interface
Using the iRODS command line, data can be ingested and then inspected at rest
The data may then be retrieved using the command line
Note: access via the POSIX interface (DAVRods, NFSRods, Metalnx) will continue to be encrypted
Data Discovery with Metalnx
Automated Ingest has provided metadata for data discovery
The metadata can be directly inspected in Metalnx
The query builder can be used to identify data sets of interest via Attribute, Value, Unit matches
Queries to the system metadata may also be performed, searching on values such as file name, collection path, user, etc.
File System Presentations: DAVRods
DAVRods provides both a simple web based interface as well as the ability to mount a folder on the desktop
DAVRods is an Apache Module implemented in C using the native iRODS POSIX API
DAVRods can be used to edit data in-place, or to copy data to/from a users collections
File System Presentation: NFSRods
NFSRods is an NFSv4 implementation base d on NFS4J using the iRODS Java API
NFSRods also leverages the iRODS POSIX API
NFSRods provides full command line capabilities to the shell for working with data in place at the console
Questions?
iRODS Executive Overview and Demonstration
By jason coposky
iRODS Executive Overview and Demonstration
An executive overview of iRODS, its technology, capabilities and deployment patterns as well as a demonstration of capabilities.
- 1,245