An Introduction to iRODS

November 2015

Agenda

Welcome

Introductions

What is iRODS?

Demonstration and Souvenirs

Automatic Metadata Extraction

Resource Composition

Questions and Discussion

Extra Credit: Federation

Welcome

Big THANKS to...

And to all of you!

Introductions

What's This All About?

We're here to...

 

   ... help grasp the mental model.

   ... show off.

   ... provide clear explanations.

   ... answer your questions.

What is iRODS?

           Overview

iRODS is open source data management software.

It...

      ...makes data findable.

 

      ...maintains data integrity.

 

      ...manages backup replicas.

 

      ...makes data shareable.

  Storage Virtualization

 

 

 

  Data Discovery

 

 

 

  Workflow Automation

 

 

 

  Secure Collaboration

          The Four Pillars

← all your storage in a single namespace

← system and user-generated metadata

← event-driven and scheduled cron  policies

 federation

Storage Virtualization: iRODS is Middleware

User Application

"Logical" Layer

Storage Environment

"Physical" Layer

storagecluster.example.org:/managed

s3.amazonaws.com:/example/bitbucket

Storage Virtualization: iRODS Clients

• Web-based and Standalone GUIs

  - iRODS Cloud Browser, MetaLnx, Kanki, Cyberduck

 

• Portals, External Systems

  - iPlant Discovery Environment, Islandora, Fedora Commons

 

• WebDAV for drag-and-drop access built in to the OS

• APIs: Python, REST, Qt, Java, C++

• Command Line Interface

"Logical" Layer

"Physical" Layer

storagecluster.example.org:/managed/Vault/home/alice/training_jpgs/clown_fish.jpg

/tempZone/home/alice/training_jpgs/clown_fish.jpg

s3.amazonaws.com:/example/bitbucket/Vault/home/alice/training_jpgs/clown_fish.jpg

Storage Virtualization: Objects and Collections

Storage Virtualization: iRODS is Distributed

Storage Virtualization: Composable Resources

replResc

demoResc

s3Resc

archiveResc

cacheResc

Data Discovery

Attribute: filename      | Value: seal.jpg

Attribute: animal         | Value: seal

Attribute: photo_color | Value: gray and brown

Attribute: file_size        | Value: 362833                | Units: bytes

acPostProcForPut {
    if ($filePath like "*.jpg" || $filePath like "*.jpeg" || $filePath like "*.bmp" || $filePath like "*.tif" || $filePath like "*.tiff" || $filePath like "*.rif" || $filePath like "*.gif" || $filePath like "*.png" || $filePath like "*.svg" || $filePath like "*.xpm") { 
    msiget_image_meta($filePath, *meta);
    msiString2KeyValPair(*meta, *meta_kvp);
    msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d"); 
    } # if 
} # acPostProcForPut 

Policy Enforcement Points = Triggers

Microservices = Actions

Workflow Automation: Rules

Secure Collaboration: Federation

Some History

iRODS is free, open source software owned by a foundation called the iRODS Consortium.

  • Goal is to sustain iRODS as free open source software by:

    ▹ Building good software.  ▹ Growing the iRODS community.  ▹ Demonstrating value.

 

  • Funds a team of 10+ developers, application engineers, documentation, support staff

The iRODS Consortium and Sustainability

Contract Customers

and more ...

Building Good Software: Plugins

• Microservices

• Storage Resources

• Authentication

• Network

• API

• Rule Engine (iRODS 4.2)

• Transport (iRODS 4.3)

Modular, maintainable code

Static Analysis and Continuous Integration

https://jenkins.irods.org

Initial Trial

  • Google Group
  • Blog posts, social media
  • Cloud images
  • Documentation
  • Training workshops
  • iRODS Hub: The iRODS App Store

Proof of Concept to Pilot

  • Occasional 1-on-1 Support
  • iRODS Consortium Members
  • iRODS Partners
  • iRODS Consortium Service Contracts

Production

  • iRODS Consortium Membership

Building Community and Demonstrating Value

https://irods.org/documentation/

Getting Plugged In with iRODS...

Demo

iRODS Cloud Browser

• Web-based GUI

• Developed for the DataNet Federation Consortium (DFC) 

   NSF-funded nationwide data grid project

 

   datafed.org

Demo

• iRODS Cloud Browser

• Automatic metadata extraction from image files

• Resource composition to replicate to Amazon S3

Souvenirs!

Explore

• Upload/download

• Add/delete metadata

• Create and navigate collections

Automatic Metadata Extraction

Install This iRODS Rule

you@laptop:~$ ssh admin@<ip address>

# install the rule/microservice plugin
admin@ec2:~$ sudo dpkg -i ./training/training-example-1.0.deb

# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json

/etc/irods/server_config.json

...
    "re_rulebase_set": [
        {
            "filename": "training_acPostProcForPut"
        },
        {
            "filename": "core"
        }
    ],
...

Add

this

section

Test It!

What have we done?

 

• Installed the plugin:

   - Deployed an iRODS rule and microservice

 

• Modified server_config.json

   - Told iRODS to load the rule

The Rule

acPostProcForPut {
    if ($filePath like "*.jpg" || $filePath like "*.jpeg" || 
$filePath like "*.bmp" || $filePath like "*.tif" || 
$filePath like "*.tiff" || $filePath like "*.rif" || 
$filePath like "*.gif" || $filePath like "*.png" || 
$filePath like "*.svg" || $filePath like "*.xpm") { 
    msiget_image_meta($filePath, *meta);
    msiString2KeyValPair(*meta, *meta_kvp);
    msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d"); 
    } # if 
} # acPostProcForPut 

/etc/irods/training_acPostProcForPut.re

The Rule

if( $filePath like "*.jpg" || $filePath like "*.jpeg" ||

        $filePath like "*.bmp"  || $filePath like "*.tif"  ||     
        $filePath like "*.tiff" || $filePath like "*.rif"  ||
        $filePath like "*.gif"  || $filePath like "*.png"  ||    
        $filePath like "*.svg"  || $filePath like "*.xpm") { 

We only want to harvest metadata from image files - filter using an 'if' statement

 

Session Variable - global variables holding values about the data object in flight

$filePath - session variable holding the physical path

The Rule

msiget_image_meta($filePath, *meta);

Once we have filtered the file type

  • invoke the microservice to harvest the metadata
  • metadata is encoded as a string in the 'out variable' *meta

 

    

 

The Rule

msiString2KeyValPair(*meta, *meta_kvp);

The metadata encoded string is converted to an internal iRODS key-value data structure in the 'out variable' *meta_kvp

 

    

 

Once we have the key-value pairs we apply them to our data object

 

    

 

The Data object is referenced by the session variable $objPath which is the logical iRODS path

 

The "d" signifies to the microservice that we are referencing a Data Object, not a Collection or Resource

 

    

 

msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "d");

The Rule

Now that metadata is applied we need to close our two scope blocks - one for the 'if' and the one for the PEP itself - 

 

 

 

        } # if
    } # acPostProcForPut

The Rule

 

• Rule Language Reference: 

        https://docs.irods.org/4.1.6/manual/rule_language/

 

• Source for this example (including microservices):

        https://github.com/irods/contrib/tree/master/microservices/training_example

 

 

 

Resource Composition

Target: Data Distribution Tree

replResc

demoResc

s3Resc

archiveResc

cacheResc

Storage Virtualization

One of the "Pillars" of iRODS

 

An abstraction layer that allows for reach into various services with no change to the client

 

Functionality provided via plugin interfaces

  • Authentication
  • Database
  • Network
  • API
  • Microservice
  • Resource

Composable Resources

Uses a well known Tree Metaphor - Branches and Leaves

 

Two types of nodes:

  •     Coordinating (branch) - pure decision making
  •     Storage (leaf) - instance managing the hardware

 

By convention Coordinating nodes do not have storage

(this is not enforced)

End Goal: A Resource Tree

Storage

Storage

Storage

Coordinating

Coordinating

Coordinating Resources

Compound - provide POSIX interface to alternative storage

Load Balanced - use gathered load values to determine choices

Passthru - weight then delegate operations to  a child resource

Random - randomly choose a child for a write operation

Replication - ensure all data objects are consistent across children

Round Robin - delegate writes to each child in series

Storage Resources

Non-Cached

    Unix File System - generic file system storage

    Ceph-RADOS - Ceph object storage

    HPSS - access to IBM High Performance Storage System

 

Cached (Archive)

    S3 - archive resource for Amazon S3

    WOS - DDN Web Object Scalar

    Universal MSS - script based access to generic archive storage

Before...

# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>

admin@ec2:~$ ils -L
/zone223/home/admin:
  admin             0 demoResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    /var/lib/irods/Vault/home/admin/beans.jpg

Let's Build a Resource Hierarchy!

# create a cache resource
admin@ec2:~$ iadmin mkresc cacheResc unixfilesystem \
`hostname`:/var/lib/irods/S3CacheVault
# create an s3 resource
admin@ec2:~$ iadmin mkresc archiveResc s3 `hostname`:irods.org-demo/`hostname -s`/Vault \ "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/.s3_auth;\
 S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=3"

# create a compound resource
admin@ec2:~$ iadmin mkresc s3Resc compound

# add the cache and archive resources as children of the compound resource
admin@ec2:~$ iadmin addchildtoresc s3Resc cacheResc cache
admin@ec2:~$ iadmin addchildtoresc s3Resc archiveResc archive

Check Your Work

admin@ec2:~$ ilsresc
demoResc
s3Resc:compound
├── archiveResc:s3
└── cacheResc

Let's Add Replication

# create a replication resource
admin@ec2:~$ iadmin mkresc replResc replication
# add demoResc and s3Resc as children of the replication resource
admin@ec2:~$ iadmin addchildtoresc replResc demoResc 
admin@ec2:~$ iadmin addchildtoresc replResc s3Resc

 

admin@ec2:~$ ilsresc
replResc:replication
├── demoResc
└── s3Resc:compound
    ├── archiveResc:s3
    └── cacheResc

Test It!

admin@ec2:~$ ils -L
/zone223/home/admin:
  admin             0 demoResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    /var/lib/irods/Vault/home/admin/beans.jpg

 

# rebalance, to force an initial sync of the resources
admin@ec2:~$ iadmin modresc replResc rebalance

 

admin@ec2:~$ ils -L
/zone223/home/admin:
  admin             0 replResc;s3Resc;cacheResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    /var/lib/irods/S3CacheVault/home/admin/beans.jpg
  admin             1 replResc;s3Resc;archiveResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    xsede15/ec2-52-3-93-223/Vault/home/admin/beans.jpg
  admin             2 replResc;demoResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    /var/lib/irods/Vault/home/admin/beans.jpg

A Few More Updates to Make

# edit /etc/irods/core.re
admin@ec2:~$ sudo nano /etc/irods/core.re
...
acSetRescSchemeForCreate {msiSetDefaultResc("replResc","null"); }
...

Change this to "replResc" from "demoResc"

/etc/irods/core.re

A Few More Updates to Make

# edit ~irods/.irods/irods_environment.json
admin@ec2:~$ sudo nano ~irods/.irods/irods_environment.json
...
"irods_default_resource": "replResc",
...

Change this from "demoResc" to "replResc"

~irods/.irods/irods_environment.json

Questions?

How would an iRODS pipeline look that integrates a high-speed online storage unit (A NAS in our case) with a very large object store that had rules for moving one transparently from one to the other according to:

 

    i) workflow rules - when you do this step, these files may be archived or

 

    ii) time based rules, all files not accessed in so many days?

Questions?

archive_rule {
# remove all copies from a specified storage system that are older than a specified time
# inputs: name of storage system, minimal age of files in seconds,
#         relative name of collection
  *Path = "/$rodsZoneClient/home/$userNameClient/" ++ *Path;
  
delay("<PLUSET>30s</PLUSET><EF>30s</EF><EF>REPEAT FOR EVER</EF>"){
  msiGetSystemTime(*Time,"unix");
  *Time = double (*Time);
  writeLine("stdout","Path is *Path");

  *Q = select DATA_NAME, COLL_NAME, DATA_CREATE_TIME where COLL_NAME = '*Path' and DATA_RESC_NAME = '*OldResource';

  foreach (*R in *Q) {
    *File = *R.DATA_NAME;
    *Coll = *R.COLL_NAME;
    *Create = double(*R.DATA_CREATE_TIME);
    *SourceFile = *Coll ++ "/" ++ *File;
    writeLine("stdout","File is *SourceFile");
    if (*Time-*Create >= *Age) {
    writeLine("stdout","Moving *SourceFile from *OldResource to *NewResource");
    msiDataObjPhymv(*SourceFile,*NewResource,*OldResource,"0","null",*Status);
      } #if
    } #foreach
  } #delay
} #archive_rule

INPUT *Path = "active", *OldResource = "LosAngeles", *NewResource = "Vancouver", *Age = 180
OUTPUT ruleExecOut

Questions?

Are there heuristics to say iRODS Zones can grow to so many objects, Petabytes, etc beyond which another zone should be considered and would beefier hardware fix this?

Questions?

Talk To Us!

https://irods.org/sc15-survey

Extra Credit: Federation

Secure Collaboration between iRODS Zones

• Share files between zones

• Local authentication (passwords stay in zone)

• Admins tell each zone about one another

• Admins exchange two keys:

     zone_key

     negotiation key

Tell Each Zone about the Others

# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>

# tell your server about your neighbor's zone
admin@ec2:~$ iadmin mkzone <remote zone name> remote <remote hostname>:1247
# create a user account for your neighbor
admin@ec2:~$ iadmin mkuser admin#<remote zone name> rodsuser

# give your neighbor read access to your home collection (don't do this in real life)
admin@ec2:~$ ichmod -r read admin#<remote zone name> /<local zone name>/home/admin 

# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json

Make sure you do this on both machines!

/etc/irods/server_config.json

...
    "federation": [
    {
      "icat_host": "<hostname of your neighbor>",
      "zone_name": "<zone name of your neighbor>",
      "zone_key": "0123456789abcdef",
      "negotiation_key": "abcdefghijklmnopqrstuvwxyzabcdef"
    }
    ],
...

Add

this

section

(to both

machines)

Try it out!

Introduction to iRODS

By iRODS Consortium

Introduction to iRODS

An Introduction to iRODS. Covers virtualization, data discovery (metadata), workflow automation (rules), and secure collaboration (federation). Discussion topics include time-based rules and scaling.

  • 2,011