Executive Overview

and

Demonstration

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

Executive Overview

and

Demonstration

January 21, 2020

OpenIO Meetup

Lille, France

Our Membership

Consortium

Member

Consortium

Member

What is iRODS

Distributed - runs on a laptop, a cluster, on premises or geographically distributed

Open Source - BSD-3 Licensed, install it today and try before you buy

Metadata Driven & Data Centric - Insulate both your users and your data from your infrastructure

iRODS as the Integration Layer

iRODS Core Competencies

The underlying technology categorized into four areas

Data Virtualization

Combine various distributed storage technologies into a Unified Namespace

  • Existing file systems
  • Cloud storage
  • On premises object storage
  • Archival storage systems

iRODS provides a logical view into the complex physical representation of your data, distributed geographically, and at scale.

Projection of the Physical into the Logical

Logical Path

Physical Path(s)

Data Discovery

Attach metadata to any first class entity within the iRODS Zone

  • Data Objects
  • Collections
  • Users
  • Storage Resources
  • The Namespace

iRODS provides automated and user-provided metadata which makes your data and infrastructure more discoverable, operational and valuable.

Metadata Everywhere

Workflow Automation

Integrated scripting language which is triggered by any operation within the framework

  • Authentication
  • Storage Access
  • Database Interaction
  • Network Activity
  • Extensible RPC API 

The iRODS rule engine provides the ability to capture real world policy as computer actionable rules which may allow, deny, or add context to operations within the system.

Dynamic Policy Enforcement

  • restrict access
  • log for audit and reporting
  • provide additional context
  • send a notification

The iRODS rule may:

Secure Collaboration

iRODS allows for collaboration across administrative boundaries after deployment

  • No need for common infrastructure
  • No need for shared funding
  • Affords temporary collaborations

iRODS provides the ability to federate namespaces across organizations without pre-coordinated funding or effort.

iRODS Policies

  • Replication
  • Versioning
  • Fixity Checking
  • Retention
  • Verification
  • Logical Quotas
  • Hard Links
  • Metadata Access Control
  • Metadata Template Enforcement
  • ...

 

  • Packaged and supported solutions
  • Require configuration not code
  • Derived from the majority of use cases observed in the user community

iRODS Capabilities

Automated Ingest - Landing Zone

Storage Tiering

Initial Goals

  1. Automatically Ingest data from a 'Landing Zone'
  2. Extract salient metadata - eg EXIF tags
  3. Tag Data Objects and Collections to make them Actionable and Discoverable
  4. Discover and interact with data on the command line
  5. Discover and interact with data via Metalnx
  6. Share data via Metalnx
  7. Interact with data via NFS and WebDAV
  8. Watch data vertically move through the storage tiers
  9. Review iRODS Audit plugin Kibana dashboard

Automated Ingest

Any data that is discovered during a scan

  • Automatically registered to a storage resource
  • Metadata extracted and applied to the object in the catalog
  • Event possibly generated for audit trail

Users can view and access data and metadata from any client

Data Discovery with Command Line

Query using imeta, a command-line iRODS client utility:

imeta qu -d "Image Make" = Apple

iquest "%s/%s" "SELECT COLL_NAME, DATA_NAME WHERE META_DATA_ATTR_NAME = 'Image Make' AND META_DATA_ATTR_VALUE = 'Apple'"

Query using iquest, a command-line iRODS client utility:

Data Discovery with Metalnx

Automated Ingest has provided metadata for data discovery

 

The metadata can be directly inspected in Metalnx

 

The query builder can be used to identify data sets of interest via Attribute, Value, Unit matches

 

Queries to the system metadata may also be performed, searching on values such as file name, collection path, user, etc.

File System Presentations: DAVRods

DAVRods provides both a simple web based interface as well as the ability to mount a folder on the desktop

 

DAVRods is an Apache Module implemented in C using the native iRODS POSIX API

 

DAVRods can be used to edit data in-place, or to copy data to/from a users collections

File System Presentations: NFSRODS

NFSRODS leverages the Java iRODS Client 'Jargon' and is implemented with NFS4J

 

NFSRODS acts as a Mid-Tier client to iRODS

 

NFSRODS projects iRODS ACLs into NFS extended ACLs

 

NFSRODS can also be used to edit data in-place, or to copy data to/from a users collections with 

Storage Tiering

A tier group was made between demoResc, openio, and archive resources:

imeta set -R demoResc irods::storage_tiering::group openio_group 0
imeta set -R openio irods::storage_tiering::group openio_group 1
imeta set -R archive irods::storage_tiering::group openio_group 2

Tiering Times were set to 30 seconds and 60 seconds:

imeta set -R demoResc irods::storage_tiering::time 30
imeta set -R openio irods::storage_tiering::time 60

No tiering time was set for 'archive' since it is terminal

Kibana Dashboard

Questions?

OpenIO Meetup - iRODS Executive Overview and Demonstration

By jason coposky

OpenIO Meetup - iRODS Executive Overview and Demonstration

An executive overview of iRODS, its technology, capabilities and deployment patterns as well as a demonstration of capabilities.

  • 1,227