An Introduction to iRODS

Presented at

Dan Bedard

Interim Executive Director

The iRODS Consortium

RENCI at the University of North Carolina

Agenda

Pre-Check

What is iRODS?

Demonstration (Spoiler)

Little Strips of Paper

Automatic Metadata Extraction

Resource Composition

Extra Credit

Questions

What's Going on Here?

Trying to...

 

   ... help grasp the mental model.

   ... show off some of the neat things we can do.

   ... explain clearly and briefly how to do some things.

   ... answer some frequently asked questions.

What is iRODS?

Overview

The Four Pillars

The Fork

Photo: "Organized" by Uwe Hermann, licensed under CC BY-SA 2.0

           Overview

iRODS is open source data management software.

It...

      ...makes data findable.

 

      ...maintains data integrity.

 

      ...manages backup replicas.

 

      ...makes data sharable.

iRODS Zones

iRODS is Middleware

User Application

"Logical" Layer

Storage Environment

"Physical" Layer

storagecluster.example.org:/managed

s3.amazonaws.com:/example/bitbucket

iRODS Clients

• Web-based and Standalone GUIs

  - iRODS Cloud Browser, MetaLnx, iDrop, PRODS

 

• Portals, External Systems

  - iPlant Discovery Environment, Islandora, Fedora Commons

 

• WebDAV for drag-and-drop access built in to the OS

• APIs: Python, REST, Qt, Java, C++

• Command Line Interface

Photo: "Jefferson Memorial Pillars Inside" by Belal Khan, licensed under CC BY 2.0

The four pillars:Ÿ

• Storage Virtualization

• Data Discovery

• Workflow Automation

• Secure Collaboration

          The Four Pillars

← all your storage in a single namespace

← system and user-generated metadata

← event-driven and scheduled cron  policies

← federation

Storage Virtualization: Composable Resources

"Logical" Layer

"Physical" Layer

storagecluster.example.org:/managed/Vault/home/alice/training_jpgs/seal.jpg

/tempZone/home/alice/training_jpgs/seal.jpg

s3.amazonaws.com:/example/bitbucket/Vault/home/alice/training_jpgs/seal.jpg

Storage Virtualization: Objects and Collections

Data Discovery

Attribute: filename      | Value: seal.jpg

Attribute: animal         | Value: seal

Attribute: photo_color | Value: gray and brown

Attribute: file_size        | Value: 362833                | Units: bytes

acPostProcForPut {
    if ($filePath like "*.jpg" || $filePath like "*.jpeg" || $filePath like "*.bmp" || $filePath like "*.tif" || $filePath like "*.tiff" || $filePath like "*.rif" || $filePath like "*.gif" || $filePath like "*.png" || $filePath like "*.svg" || $filePath like "*.xpm") { 
    msiget_image_meta($filePath, *meta);
    msiString2KeyValPair(*meta, *meta_kvp);
    msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d"); 
    } # if 
} # acPostProcForPut 

Policy Enforcement Points = Triggers

Microservices = Actions

Workflow Automation: Rules

Secure Collaboration: Federation

A Tiny Bit of History

• 2006: iRODS developed by DICE at SDSC.

• 2008: DICE expands to UNC. RENCI evaluates iRODS.

Code fork: E-iRODS and Community iRODS.

• 2014: Code merge: iRODS 4.0

• 2015: iRODS 4.1

Enterprise Readiness

• Modular, maintainable code

• Static analysis and continuous integration

• Sustainable funding and governance model

Plugins

• Microservices

• Storage Resources

• Authentication

• Network

• Rule Engine, API, Transport

Static Analysis and Continuous Integration

https://jenkins.irods.org/view/1.%20Core%20Development/

iRODS is free, open source software owned by a foundation called the iRODS Consortium.

  • Members pay an annual membership fee: 4 levels of membership.

  • Members have agreed upon iRODS as an area of cooperation, rather than competition.

  • Two monthly meetings: Technology Working Group (TWG), Planning Committee

  • Goal is to create a sustainable open source project.

  • Presently, funds a team of 10+ developers, application engineers, documentation, support staff

Sustainable Governance and Funding Model

+2

Contract Customers

Initial Trial

  • Documentation, training
  • Blog posts, social media
  • Cloud images
  • Google Group
  • iRODS Hub

Proof of Concept

  • Occasional 1-on-1 Support
  • Service Contract

Pilot

  • iRODS Partners
  • Service Contract

Production

  • Consortium Membership
  • iRODS Partners
  • Service Contract

Getting Started (and Keeping Going) with iRODS

http://irods.org/documentation/

Demo

iRODS Cloud Browser

• Web-based GUI

• Pre beta

• Developed for the DataNet Federation Consortium (DFC) 

   NSF-funded nationwide data grid project

 

   datafed.org

Demo

• iRODS Cloud Browser

• Automatic metadata extraction from image files

• Resource composition to replicate to Amazon S3

Little Strips of Paper

Log in

Explore

Log In

• Type the ip address on your paper into your browser

The window below should appear

localhost

1247

zone<last octet>

admin

admin!

Standard

Explore

• Upload/download

• Add/delete metadata

• Create and navigate collections

Automatic Metadata Extraction

Install This iRODS Rule

you@laptop:~$ ssh admin@<ip address>

# install the rule/microservice plugin
admin@ec2:~$ sudo dpkg -i ./xsede/training-example-1.0.deb

# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json

/etc/irods/server_config.json

...
    "re_rulebase_set": [
        {
            "filename": "training_acPostProcForPut"
        },
        {
            "filename": "core"
        }
    ],
...

Add

this

section

Test It!

What have we done?

• Installed the plugin:

   -Deployed an iRODS rule and microservice

 

• Modified server_config.json

   -Told iRODS to load the rule

The Rule

acPostProcForPut {
    if ($filePath like "*.jpg" || $filePath like "*.jpeg" || 
$filePath like "*.bmp" || $filePath like "*.tif" || 
$filePath like "*.tiff" || $filePath like "*.rif" || 
$filePath like "*.gif" || $filePath like "*.png" || 
$filePath like "*.svg" || $filePath like "*.xpm") { 
    msiget_image_meta($filePath, *meta);
    msiString2KeyValPair(*meta, *meta_kvp);
    msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d"); 
    } # if 
} # acPostProcForPut 

/etc/irods/training_acPostProcForPut.re

The Rule

if( $filePath like "*.jpg" || $filePath like "*.jpeg" ||

        $filePath like "*.bmp"  || $filePath like "*.tif"  ||     
        $filePath like "*.tiff" || $filePath like "*.rif"  ||
        $filePath like "*.gif"  || $filePath like "*.png"  ||    
        $filePath like "*.svg"  || $filePath like "*.xpm") { 

We only want to harvest metadata from image files - filter using an 'if' statement

 

Session Variable - global variables holding values about the data object in flight

$filePath - session variable holding the physical path

The Rule

msiget_image_meta($filePath, *meta);

Once we have filtered the file type

  • invoke the microservice to harvest the metadata
  • metadata is encoded as a string in the 'out variable' *meta

 

    

 

The Rule

msiString2KeyValPair(*meta, *meta_kvp);

The metadata encoded string is converted to an internal iRODS key-value data structure in the 'out variable' *meta_kvp

 

    

 

Once we have the key-value pairs we apply them to our data object

 

    

 

The Data object is referenced by the session variable $objPath which is the logical iRODS path

 

The "d" signifies to the microservice that we are referencing a Data Object, not a Collection or Resource

 

    

 

msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "d");

The Rule

Now that metadata is applied we need to close our two scope blocks - one for the 'if' and the one for the PEP itself - 

 

 

 

        } # if
    } # acPostProcForPut

The Rule

 

• Rule Language Reference: 

https://docs.irods.org/4.1.3/manual/rule_language/

 

• Source for this example (including microservices):

https://github.com/irods/contrib/tree/master/microservices/training_example

 

 

 

Resource Composition

End Goal: A Resource Tree

Storage Virtualization

One of the "Pillars" of iRODS

 

An abstraction layer that allows for reach into various services with no change to the client

 

Functionality provided via plugin interfaces

  • Authentication
  • Database
  • Network
  • API
  • Microservice
  • Resource

Composable Resources

Uses a well known Tree Metaphor - Branches and Leaves

Two types of nodes:

  •     Coordinating (branch) - pure decision making
  •     Storage (leaf) - instance managing the hardware

 

By convention Coordinating nodes do not have storage

(this is not enforced)

End Goal: A Resource Tree

Storage

Storage

Storage

Coordinating

Coordinating

Coordinating Resources

Compound - provide POSIX interface to alternative storage

Load Balanced - use gathered load values to determine choices

Passthru - weight then delegate operations to  a child resource

Random - randomly choose a child for a write operation

Replication - ensure all data objects are consistent across children

Round Robin - delegate writes to each child in series

Storage Resources

Non-Cached

    Unix File System - generic file system storage

    Ceph-RADOS - Ceph object storage

    HPSS - access to IBM High Performance Storage System

 

Cached (Archive)

    S3 - archive resource for Amazon S3

    WOS - DDN Web Object Scalar

    Universal MSS - script based access to generic archive storage

Let's Try It: Building a Resource Hierarchy

# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>

# create a cache resource
admin@ec2:~$ iadmin mkresc cacheResc unixfilesystem \ `hostname`:/var/lib/irods/S3CacheVault
# create an s3 resource
admin@ec2:~$ iadmin mkresc archiveResc s3 `hostname`:xsede15/`hostname -s`/Vault \ "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/.s3_auth;\
 S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=3"
# create a compound resource
admin@ec2:~$ iadmin mkresc s3Resc compound
# add the cache and archive resources as children of the compound resource
admin@ec2:~$ iadmin addchildtoresc s3Resc cacheResc cache
admin@ec2:~$ iadmin addchildtoresc s3Resc archiveResc archive

Let's Try It: Building a Resource Hierarchy

# create a replication resource
admin@ec2:~$ iadmin mkresc replResc replication
# add demoResc and s3Resc as children of the replication resource
admin@ec2:~$ iadmin addchildtoresc replResc demoResc 
admin@ec2:~$ iadmin addchildtoresc replResc s3Resc
# rebalance, to force an initial sync of the resources
admin@ec2:~$ iadmin modresc replResc rebalance
# edit /etc/irods/core.re
admin@ec2:~$ sudo nano /etc/irods/core.re
...
acSetRescSchemeForCreate {msiSetDefaultResc("replResc","null"); }
...

Change this from "demoResc"

/etc/irods/core.re

Let's Try It: Building a Resource Hierarchy

# edit ~irods/.irods/irods_environment.json
admin@ec2:~$ sudo nano ~irods/.irods/irods_environment.json
...
"irods_default_resource": "replResc",
...

Change this from "demoResc"

~irods/.irods/irods_environment.json

Test It!

admin@ec2:~$ ilsresc
replResc:replication
├── demoResc
└── s3Resc:compound
    ├── archiveResc:s3
    └── cacheResc

admin@ec2:~$ ils -L
/zone223/home/admin:
  admin             0 replResc;s3Resc;cacheResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    /var/lib/irods/S3CacheVault/home/admin/beans.jpg
  admin             1 replResc;s3Resc;archiveResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    xsede15/ec2-52-3-93-223/Vault/home/admin/beans.jpg
  admin             2 replResc;demoResc      1128069 2015-07-28.04:40 & beans.jpg
        generic    /var/lib/irods/Vault/home/admin/beans.jpg

Extra Credit: Federation

Secure Collaboration between iRODS Zones

• Share files between zones

• Local authentication (passwords stay in zone)

• Admins tell each zone about one another

• Admins exchange two keys: zone_key  & negotiation key

Tell Each Zone about the Others

# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>

# tell your server about your neighbor's zone
admin@ec2:~$ iadmin mkzone <remote zone name> remote <remote hostname>:1247
# create a user account for your neighbor
admin@ec2:~$ iadmin mkuser admin#<remote zone name> rodsuser

# give your neighbor read access to your home collection (don't do this in real life)
admin@ec2:~$ ichmod -r read admin#<remote zone name> /<local zone name>/home/admin 

# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json

Make sure you do this on both machines!

/etc/irods/server_config.json

...
    "federation": [
    {
      "icat_host": "<hostname of your neighbor>",
      "zone_name": "<zone name of your neighbor>",
      "zone_key": "0123456789abcdef",
      "negotiation_key": "abcdefghijklmnopqrstuvwxyzabcdef"
    }
    ],
...

Add

this

section

(to both

machines)

Try it out!

Questions?

Thank you!

 

Dan Bedard

danb@renci.org

+1-919-445-0632

XSEDE15: Introduction to iRODS

By beppodb

XSEDE15: Introduction to iRODS

  • 2,449