An Introduction to iRODS
November 2015
Agenda
Welcome
Introductions
What is iRODS?
Demonstration and Souvenirs
Automatic Metadata Extraction
Resource Composition
Questions and Discussion
Extra Credit: Federation
Welcome
Big THANKS to...
And to all of you!
Introductions
What's This All About?
We're here to...
... help grasp the mental model.
... show off.
... provide clear explanations.
... answer your questions.
What is iRODS?
Overview
iRODS is open source data management software.
It...
...makes data findable.
...maintains data integrity.
...manages backup replicas.
...makes data shareable.
Storage Virtualization
Data Discovery
Workflow Automation
Secure Collaboration
The Four Pillars
← all your storage in a single namespace
← system and user-generated metadata
← event-driven and scheduled cron → policies
← federation
Storage Virtualization: iRODS is Middleware
User Application
"Logical" Layer
Storage Environment
"Physical" Layer
storagecluster.example.org:/managed
s3.amazonaws.com:/example/bitbucket
Storage Virtualization: iRODS Clients
• Web-based and Standalone GUIs
- iRODS Cloud Browser, MetaLnx, Kanki, Cyberduck
• Portals, External Systems
- iPlant Discovery Environment, Islandora, Fedora Commons
• WebDAV for drag-and-drop access built in to the OS
• APIs: Python, REST, Qt, Java, C++
• Command Line Interface
"Logical" Layer
"Physical" Layer
storagecluster.example.org:/managed/Vault/home/alice/training_jpgs/clown_fish.jpg
/tempZone/home/alice/training_jpgs/clown_fish.jpg
s3.amazonaws.com:/example/bitbucket/Vault/home/alice/training_jpgs/clown_fish.jpg
Storage Virtualization: Objects and Collections
Storage Virtualization: iRODS is Distributed
Storage Virtualization: Composable Resources
replResc
demoResc
s3Resc
archiveResc
cacheResc
Data Discovery
Attribute: filename | Value: seal.jpg
Attribute: animal | Value: seal
Attribute: photo_color | Value: gray and brown
Attribute: file_size | Value: 362833 | Units: bytes
acPostProcForPut {
if ($filePath like "*.jpg" || $filePath like "*.jpeg" || $filePath like "*.bmp" || $filePath like "*.tif" || $filePath like "*.tiff" || $filePath like "*.rif" || $filePath like "*.gif" || $filePath like "*.png" || $filePath like "*.svg" || $filePath like "*.xpm") {
msiget_image_meta($filePath, *meta);
msiString2KeyValPair(*meta, *meta_kvp);
msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d");
} # if
} # acPostProcForPut
Policy Enforcement Points = Triggers
Microservices = Actions
Workflow Automation: Rules
↗
↘
Secure Collaboration: Federation
→
↗
↙
↖
↘
←
Some History
iRODS is free, open source software owned by a foundation called the iRODS Consortium.
Goal is to sustain iRODS as free open source software by:
▹ Building good software. ▹ Growing the iRODS community. ▹ Demonstrating value.
Funds a team of 10+ developers, application engineers, documentation, support staff
The iRODS Consortium and Sustainability
Contract Customers
and more ...
Building Good Software: Plugins
• Microservices
• Storage Resources
• Authentication
• Network
• API
• Rule Engine (iRODS 4.2)
• Transport (iRODS 4.3)
Modular, maintainable code
Static Analysis and Continuous Integration
https://jenkins.irods.org
Initial Trial
Proof of Concept to Pilot
Production
Building Community and Demonstrating Value
https://irods.org/documentation/
Getting Plugged In with iRODS...
Demo
iRODS Cloud Browser
• Web-based GUI
• Developed for the DataNet Federation Consortium (DFC)
NSF-funded nationwide data grid project
datafed.org
Demo
• iRODS Cloud Browser
• Automatic metadata extraction from image files
• Resource composition to replicate to Amazon S3
Souvenirs!
Explore
• Upload/download
• Add/delete metadata
• Create and navigate collections
Automatic Metadata Extraction
Install This iRODS Rule
you@laptop:~$ ssh admin@<ip address>
# install the rule/microservice plugin
admin@ec2:~$ sudo dpkg -i ./training/training-example-1.0.deb
# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json
/etc/irods/server_config.json
...
"re_rulebase_set": [
{
"filename": "training_acPostProcForPut"
},
{
"filename": "core"
}
],
...
Add
this
section
Test It!
What have we done?
• Installed the plugin:
- Deployed an iRODS rule and microservice
• Modified server_config.json
- Told iRODS to load the rule
The Rule
acPostProcForPut {
if ($filePath like "*.jpg" || $filePath like "*.jpeg" ||
$filePath like "*.bmp" || $filePath like "*.tif" ||
$filePath like "*.tiff" || $filePath like "*.rif" ||
$filePath like "*.gif" || $filePath like "*.png" ||
$filePath like "*.svg" || $filePath like "*.xpm") {
msiget_image_meta($filePath, *meta);
msiString2KeyValPair(*meta, *meta_kvp);
msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d");
} # if
} # acPostProcForPut
/etc/irods/training_acPostProcForPut.re
The Rule
if( $filePath like "*.jpg" || $filePath like "*.jpeg" ||
$filePath like "*.bmp" || $filePath like "*.tif" ||
$filePath like "*.tiff" || $filePath like "*.rif" ||
$filePath like "*.gif" || $filePath like "*.png" ||
$filePath like "*.svg" || $filePath like "*.xpm") {
We only want to harvest metadata from image files - filter using an 'if' statement
Session Variable - global variables holding values about the data object in flight
$filePath - session variable holding the physical path
The Rule
msiget_image_meta($filePath, *meta);
Once we have filtered the file type
The Rule
msiString2KeyValPair(*meta, *meta_kvp);
The metadata encoded string is converted to an internal iRODS key-value data structure in the 'out variable' *meta_kvp
Once we have the key-value pairs we apply them to our data object
The Data object is referenced by the session variable $objPath which is the logical iRODS path
The "d" signifies to the microservice that we are referencing a Data Object, not a Collection or Resource
msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "d");
The Rule
Now that metadata is applied we need to close our two scope blocks - one for the 'if' and the one for the PEP itself -
} # if
} # acPostProcForPut
The Rule
• Rule Language Reference:
https://docs.irods.org/4.1.6/manual/rule_language/
• Source for this example (including microservices):
https://github.com/irods/contrib/tree/master/microservices/training_example
Resource Composition
Target: Data Distribution Tree
replResc
demoResc
s3Resc
archiveResc
cacheResc
Storage Virtualization
One of the "Pillars" of iRODS
An abstraction layer that allows for reach into various services with no change to the client
Functionality provided via plugin interfaces
Composable Resources
Uses a well known Tree Metaphor - Branches and Leaves
Two types of nodes:
By convention Coordinating nodes do not have storage
(this is not enforced)
End Goal: A Resource Tree
Storage
Storage
Storage
Coordinating
Coordinating
Coordinating Resources
Compound - provide POSIX interface to alternative storage
Load Balanced - use gathered load values to determine choices
Passthru - weight then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Round Robin - delegate writes to each child in series
Storage Resources
Non-Cached
Unix File System - generic file system storage
Ceph-RADOS - Ceph object storage
HPSS - access to IBM High Performance Storage System
Cached (Archive)
S3 - archive resource for Amazon S3
WOS - DDN Web Object Scalar
Universal MSS - script based access to generic archive storage
Before...
# if you're not already logged in you@laptop:~$ ssh admin@<ip address> admin@ec2:~$ ils -L /zone223/home/admin: admin 0 demoResc 1128069 2015-07-28.04:40 & beans.jpg generic /var/lib/irods/Vault/home/admin/beans.jpg
Let's Build a Resource Hierarchy!
# create a cache resource
admin@ec2:~$ iadmin mkresc cacheResc unixfilesystem \
`hostname`:/var/lib/irods/S3CacheVault
# create an s3 resource
admin@ec2:~$ iadmin mkresc archiveResc s3 `hostname`:irods.org-demo/`hostname -s`/Vault \ "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/.s3_auth;\
S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=3"
# create a compound resource
admin@ec2:~$ iadmin mkresc s3Resc compound
# add the cache and archive resources as children of the compound resource
admin@ec2:~$ iadmin addchildtoresc s3Resc cacheResc cache
admin@ec2:~$ iadmin addchildtoresc s3Resc archiveResc archive
Check Your Work
admin@ec2:~$ ilsresc
demoResc
s3Resc:compound
├── archiveResc:s3
└── cacheResc
Let's Add Replication
# create a replication resource
admin@ec2:~$ iadmin mkresc replResc replication
# add demoResc and s3Resc as children of the replication resource
admin@ec2:~$ iadmin addchildtoresc replResc demoResc
admin@ec2:~$ iadmin addchildtoresc replResc s3Resc
admin@ec2:~$ ilsresc
replResc:replication
├── demoResc
└── s3Resc:compound
├── archiveResc:s3
└── cacheResc
Test It!
admin@ec2:~$ ils -L
/zone223/home/admin:
admin 0 demoResc 1128069 2015-07-28.04:40 & beans.jpg
generic /var/lib/irods/Vault/home/admin/beans.jpg
# rebalance, to force an initial sync of the resources
admin@ec2:~$ iadmin modresc replResc rebalance
admin@ec2:~$ ils -L
/zone223/home/admin:
admin 0 replResc;s3Resc;cacheResc 1128069 2015-07-28.04:40 & beans.jpg
generic /var/lib/irods/S3CacheVault/home/admin/beans.jpg
admin 1 replResc;s3Resc;archiveResc 1128069 2015-07-28.04:40 & beans.jpg
generic xsede15/ec2-52-3-93-223/Vault/home/admin/beans.jpg
admin 2 replResc;demoResc 1128069 2015-07-28.04:40 & beans.jpg
generic /var/lib/irods/Vault/home/admin/beans.jpg
A Few More Updates to Make
# edit /etc/irods/core.re
admin@ec2:~$ sudo nano /etc/irods/core.re
...
acSetRescSchemeForCreate {msiSetDefaultResc("replResc","null"); }
...
Change this to "replResc" from "demoResc"
/etc/irods/core.re
A Few More Updates to Make
# edit ~irods/.irods/irods_environment.json
admin@ec2:~$ sudo nano ~irods/.irods/irods_environment.json
... "irods_default_resource": "replResc", ...
Change this from "demoResc" to "replResc"
~irods/.irods/irods_environment.json
Questions?
How would an iRODS pipeline look that integrates a high-speed online storage unit (A NAS in our case) with a very large object store that had rules for moving one transparently from one to the other according to:
i) workflow rules - when you do this step, these files may be archived or
ii) time based rules, all files not accessed in so many days?
Questions?
archive_rule {
# remove all copies from a specified storage system that are older than a specified time
# inputs: name of storage system, minimal age of files in seconds,
# relative name of collection
*Path = "/$rodsZoneClient/home/$userNameClient/" ++ *Path;
delay("<PLUSET>30s</PLUSET><EF>30s</EF><EF>REPEAT FOR EVER</EF>"){
msiGetSystemTime(*Time,"unix");
*Time = double (*Time);
writeLine("stdout","Path is *Path");
*Q = select DATA_NAME, COLL_NAME, DATA_CREATE_TIME where COLL_NAME = '*Path' and DATA_RESC_NAME = '*OldResource';
foreach (*R in *Q) {
*File = *R.DATA_NAME;
*Coll = *R.COLL_NAME;
*Create = double(*R.DATA_CREATE_TIME);
*SourceFile = *Coll ++ "/" ++ *File;
writeLine("stdout","File is *SourceFile");
if (*Time-*Create >= *Age) {
writeLine("stdout","Moving *SourceFile from *OldResource to *NewResource");
msiDataObjPhymv(*SourceFile,*NewResource,*OldResource,"0","null",*Status);
} #if
} #foreach
} #delay
} #archive_rule
INPUT *Path = "active", *OldResource = "LosAngeles", *NewResource = "Vancouver", *Age = 180
OUTPUT ruleExecOut
Questions?
Are there heuristics to say iRODS Zones can grow to so many objects, Petabytes, etc beyond which another zone should be considered and would beefier hardware fix this?
Questions?
Talk To Us!
https://irods.org/sc15-survey
Extra Credit: Federation
Secure Collaboration between iRODS Zones
• Share files between zones
• Local authentication (passwords stay in zone)
• Admins tell each zone about one another
• Admins exchange two keys:
zone_key
negotiation key
Tell Each Zone about the Others
# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>
# tell your server about your neighbor's zone
admin@ec2:~$ iadmin mkzone <remote zone name> remote <remote hostname>:1247
# create a user account for your neighbor
admin@ec2:~$ iadmin mkuser admin#<remote zone name> rodsuser
# give your neighbor read access to your home collection (don't do this in real life)
admin@ec2:~$ ichmod -r read admin#<remote zone name> /<local zone name>/home/admin
# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json
Make sure you do this on both machines!
/etc/irods/server_config.json
...
"federation": [
{
"icat_host": "<hostname of your neighbor>",
"zone_name": "<zone name of your neighbor>",
"zone_key": "0123456789abcdef",
"negotiation_key": "abcdefghijklmnopqrstuvwxyzabcdef"
}
],
...
Add
this
section
(to both
machines)
Try it out!