Policy Training
Executive Overview
Jason Coposky
@jason_coposky
Executive Director, iRODS Consortium
Policy Training
Executive Overview

August 3-6, 2020
KU Leuven Training
Webinar Presentation




Our Membership























Consortium
Member

Consortium
Member

Consortium
Member

Consortium
Member


Our Business Model
Start with proof of concept
- Use Case Driven
- Hands on
- Service and Support Contract
- Master Services Agreement
Consortium Membership
- Four Levels - $10k to $150k
- 10 to 300 hours of support
- Participation in Software roadmap
- Discounted hourly rate
Tier 3 Support
- Systems Integrators
- Compute Vendors
- Storage Vendors
What is iRODS
Distributed - runs on a laptop, a cluster, on premises or geographically distributed
Open Source - BSD-3 Licensed, install it today and try before you buy
Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure

iRODS as the Integration Layer


The iRODS Data Management Model

Core Competencies
Policy
Capabilities
Patterns
iRODS Core Competencies
The underlying technology categorized into four areas






Data Virtualization

Combine various distributed storage technologies into a Unified Namespace
- Existing file systems
- Cloud storage
- On premises object storage
- Archival storage systems
iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.


Projection of the Physical into the Logical

Logical Path
Physical Path(s)

Data Discovery
Attach metadata to any first class entity within the iRODS Zone
- Data Objects
- Collections
- Users
- Storage Resources
- The Namespace
iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.



Metadata Everywhere


Workflow Automation
Integrated scripting language which is triggered by any operation within the framework
- Authentication
- Storage Access
- Database Interaction
- Network Activity
- Extensible RPC API

The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.


Dynamic Policy Enforcement

- restrict access
- log for audit and reporting
- provide additional context
- send a notification
The iRODS rule may:

Dynamic Policy Enforcement

A single API call expands to many plugin operations all of which may invoke policy enforcement
- Authentication
- Database
- Storage
- Network
- Rule Engine
- Microservice
- RPC API
Plugin Interfaces:

Secure Collaboration
iRODS allows for collaboration across administrative boundaries after deployment
- No need for common infrastructure
- No need for shared funding
- Affords temporary collaborations
iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.



iRODS as a Service Interface


Federation - Shared Data and Services


Ingest to Institutional repository
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.


What is a Policy
A Definition of Policy
A set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people...
So how does iRODS do this?

iRODS Policies


The reflection of real world data management decisions in computer actionable code.
(a plan of what to do in particular situations)
Possible Policies - The What


- Data Movement
- Data Verification
- Data Retention
- Data Replication
- Data Placement
- Checksum Validation
- Metadata Extraction
- Metadata Application
- Metadata Conformance
- Replica Verification
- Vault to Catalog Verification
- Catalog to Vault Verification
- ...
Policy Composition


Consider policy as building blocks towards capabilities
Follow proven software engineering principles:
Favor composition over monolithic implementations
Rules and Dynamic Policy Enforcement Points can be overloaded and fall through
Implement or configure several rule bases or rule engine plugins to achieve complex use cases
When - The Event Handler



Policy Composition


Consider Storage Tiering:
- Violating Object Identification
- Data Movement
- Data Replication
- Data Verification
- Data Retention

- Packaged and supported solutions
- Require configuration not code
- Derived from the majority of use cases observed in the user community








iRODS Capabilities

Automated Ingest - Landing Zone



Automated Ingest - Filesystem Scanning


Storage Tiering



Core Competencies
Policy
Capabilities
Indexing


Core Competencies
Policy
Capabilities

Publishing

Deployment Patterns



Data to Compute
Compute to Data

Data Transfer Nodes

Filesystem Synchronization

Filesystem Synchronization



Data to Compute



Compute to Data



Data Transfer Nodes



The Data Management Model



Use Cases
iRODS

The Wellcome Sanger Institute


Sanger - Replication

- Data preferentially placed on resource servers in the green data center (fallback to red)
- Data replicated to the other room.
- Checksums applied
- Green and red centers both used for read access.

Sanger - Metadata
attribute: library
attribute: total_reads
attribute: type
attribute: lane
attribute: is_paired_read
attribute: study_accession_number
attribute: library_id
attribute: sample_accession_number
attribute: sample_public_name
attribute: manual_qc
attribute: tag
attribute: sample_common_name
attribute: md5
attribute: tag_index
attribute: study_title
attribute: study_id
attribute: reference
attribute: sample
attribute: target
attribute: sample_id
attribute: id_run
attribute: study
attribute: alignment
- Example metadata attributes
- Users query and access data from local compute clusters
- Users access iRODS locally via the command line interface

Sanger - Federation


Maastricht DataHub
Maastricht DataHub
SURF Scale Out Pilot



University Zone
Catalog



University Zone
Catalog


Server Hosting Environment
Resource Server
Resource Server

Tape Archive
Disk Storage
Object Storage
SURF EUDAT CDI

External Community Zones
Catalog

Zone
Catalog
Local Storage
CXFS

Tape Library



EUDAT University Zone
Catalog



EUDAT University Zone
Catalog
B2SAFE iRODS Federation
EUDAT Centers
iRODS Federation
ARCHIVE
GridFTP Data Movement
Questions?
KU Leuven Policy Training - Executive Overview
By jason coposky
KU Leuven Policy Training - Executive Overview
An executive overview of iRODS, its technology, capabilities and deployment patterns as well as a demonstration of capabilities.
- 1,176