An Introduction to iRODS
Presented at
Dan Bedard
Interim Executive Director
The iRODS Consortium
RENCI at the University of North Carolina
Agenda
Pre-Check
What is iRODS?
Demonstration (Spoiler)
Little Strips of Paper
Automatic Metadata Extraction
Resource Composition
Extra Credit
Questions
What's Going on Here?
Trying to...
... help grasp the mental model.
... show off some of the neat things we can do.
... explain clearly and briefly how to do some things.
... answer some frequently asked questions.
What is iRODS?
Overview
The Four Pillars
The Fork
Photo: "Organized" by Uwe Hermann, licensed under CC BY-SA 2.0
Overview
iRODS is open source data management software.
It...
...makes data findable.
...maintains data integrity.
...manages backup replicas.
...makes data sharable.
iRODS Zones
iRODS is Middleware
User Application
"Logical" Layer
Storage Environment
"Physical" Layer
storagecluster.example.org:/managed
s3.amazonaws.com:/example/bitbucket
iRODS Clients
• Web-based and Standalone GUIs
- iRODS Cloud Browser, MetaLnx, iDrop, PRODS
• Portals, External Systems
- iPlant Discovery Environment, Islandora, Fedora Commons
• WebDAV for drag-and-drop access built in to the OS
• APIs: Python, REST, Qt, Java, C++
• Command Line Interface
Photo: "Jefferson Memorial Pillars Inside" by Belal Khan, licensed under CC BY 2.0
The four pillars:
• Storage Virtualization
• Data Discovery
• Workflow Automation
• Secure Collaboration
The Four Pillars
← all your storage in a single namespace
← system and user-generated metadata
← event-driven and scheduled cron → policies
← federation
Storage Virtualization: Composable Resources
"Logical" Layer
"Physical" Layer
storagecluster.example.org:/managed/Vault/home/alice/training_jpgs/seal.jpg
/tempZone/home/alice/training_jpgs/seal.jpg
s3.amazonaws.com:/example/bitbucket/Vault/home/alice/training_jpgs/seal.jpg
Storage Virtualization: Objects and Collections
Data Discovery
Attribute: filename | Value: seal.jpg
Attribute: animal | Value: seal
Attribute: photo_color | Value: gray and brown
Attribute: file_size | Value: 362833 | Units: bytes
acPostProcForPut {
if ($filePath like "*.jpg" || $filePath like "*.jpeg" || $filePath like "*.bmp" || $filePath like "*.tif" || $filePath like "*.tiff" || $filePath like "*.rif" || $filePath like "*.gif" || $filePath like "*.png" || $filePath like "*.svg" || $filePath like "*.xpm") {
msiget_image_meta($filePath, *meta);
msiString2KeyValPair(*meta, *meta_kvp);
msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d");
} # if
} # acPostProcForPut
Policy Enforcement Points = Triggers
Microservices = Actions
Workflow Automation: Rules
↗
↘
Secure Collaboration: Federation
←
→
↗
↙
↖
↘
A Tiny Bit of History
• 2006: iRODS developed by DICE at SDSC.
• 2008: DICE expands to UNC. RENCI evaluates iRODS.
Code fork: E-iRODS and Community iRODS.
• 2014: Code merge: iRODS 4.0
• 2015: iRODS 4.1
Enterprise Readiness
• Modular, maintainable code
• Static analysis and continuous integration
• Sustainable funding and governance model
Plugins
• Microservices
• Storage Resources
• Authentication
• Network
• Rule Engine, API, Transport
Static Analysis and Continuous Integration
https://jenkins.irods.org/view/1.%20Core%20Development/
iRODS is free, open source software owned by a foundation called the iRODS Consortium.
-
Members pay an annual membership fee: 4 levels of membership.
-
Members have agreed upon iRODS as an area of cooperation, rather than competition.
-
Two monthly meetings: Technology Working Group (TWG), Planning Committee
-
Goal is to create a sustainable open source project.
-
Presently, funds a team of 10+ developers, application engineers, documentation, support staff
Sustainable Governance and Funding Model
+2
Contract Customers
Initial Trial
- Documentation, training
- Blog posts, social media
- Cloud images
- Google Group
- iRODS Hub
Proof of Concept
- Occasional 1-on-1 Support
- Service Contract
Pilot
- iRODS Partners
- Service Contract
Production
- Consortium Membership
- iRODS Partners
- Service Contract
Getting Started (and Keeping Going) with iRODS
http://irods.org/documentation/
Demo
iRODS Cloud Browser
• Web-based GUI
• Pre beta
• Developed for the DataNet Federation Consortium (DFC)
NSF-funded nationwide data grid project
datafed.org
Demo
• iRODS Cloud Browser
• Automatic metadata extraction from image files
• Resource composition to replicate to Amazon S3
Little Strips of Paper
Log in
Explore
Log In
• Type the ip address on your paper into your browser
The window below should appear
localhost
1247
zone<last octet>
admin
admin!
Standard
Explore
• Upload/download
• Add/delete metadata
• Create and navigate collections
Automatic Metadata Extraction
Install This iRODS Rule
you@laptop:~$ ssh admin@<ip address>
# install the rule/microservice plugin
admin@ec2:~$ sudo dpkg -i ./xsede/training-example-1.0.deb
# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json
/etc/irods/server_config.json
...
"re_rulebase_set": [
{
"filename": "training_acPostProcForPut"
},
{
"filename": "core"
}
],
...
Add
this
section
Test It!
What have we done?
• Installed the plugin:
-Deployed an iRODS rule and microservice
• Modified server_config.json
-Told iRODS to load the rule
The Rule
acPostProcForPut {
if ($filePath like "*.jpg" || $filePath like "*.jpeg" ||
$filePath like "*.bmp" || $filePath like "*.tif" ||
$filePath like "*.tiff" || $filePath like "*.rif" ||
$filePath like "*.gif" || $filePath like "*.png" ||
$filePath like "*.svg" || $filePath like "*.xpm") {
msiget_image_meta($filePath, *meta);
msiString2KeyValPair(*meta, *meta_kvp);
msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "-d");
} # if
} # acPostProcForPut
/etc/irods/training_acPostProcForPut.re
The Rule
if( $filePath like "*.jpg" || $filePath like "*.jpeg" ||
$filePath like "*.bmp" || $filePath like "*.tif" ||
$filePath like "*.tiff" || $filePath like "*.rif" ||
$filePath like "*.gif" || $filePath like "*.png" ||
$filePath like "*.svg" || $filePath like "*.xpm") {
We only want to harvest metadata from image files - filter using an 'if' statement
Session Variable - global variables holding values about the data object in flight
$filePath - session variable holding the physical path
The Rule
msiget_image_meta($filePath, *meta);
Once we have filtered the file type
- invoke the microservice to harvest the metadata
- metadata is encoded as a string in the 'out variable' *meta
The Rule
msiString2KeyValPair(*meta, *meta_kvp);
The metadata encoded string is converted to an internal iRODS key-value data structure in the 'out variable' *meta_kvp
Once we have the key-value pairs we apply them to our data object
The Data object is referenced by the session variable $objPath which is the logical iRODS path
The "d" signifies to the microservice that we are referencing a Data Object, not a Collection or Resource
msiAssociateKeyValuePairsToObj(*meta_kvp, $objPath, "d");
The Rule
Now that metadata is applied we need to close our two scope blocks - one for the 'if' and the one for the PEP itself -
} # if
} # acPostProcForPut
The Rule
• Rule Language Reference:
https://docs.irods.org/4.1.3/manual/rule_language/
• Source for this example (including microservices):
https://github.com/irods/contrib/tree/master/microservices/training_example
Resource Composition
End Goal: A Resource Tree
Storage Virtualization
One of the "Pillars" of iRODS
An abstraction layer that allows for reach into various services with no change to the client
Functionality provided via plugin interfaces
- Authentication
- Database
- Network
- API
- Microservice
- Resource
Composable Resources
Uses a well known Tree Metaphor - Branches and Leaves
Two types of nodes:
- Coordinating (branch) - pure decision making
- Storage (leaf) - instance managing the hardware
By convention Coordinating nodes do not have storage
(this is not enforced)
End Goal: A Resource Tree
Storage
Storage
Storage
Coordinating
Coordinating
Coordinating Resources
Compound - provide POSIX interface to alternative storage
Load Balanced - use gathered load values to determine choices
Passthru - weight then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Round Robin - delegate writes to each child in series
Storage Resources
Non-Cached
Unix File System - generic file system storage
Ceph-RADOS - Ceph object storage
HPSS - access to IBM High Performance Storage System
Cached (Archive)
S3 - archive resource for Amazon S3
WOS - DDN Web Object Scalar
Universal MSS - script based access to generic archive storage
Let's Try It: Building a Resource Hierarchy
# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>
# create a cache resource
admin@ec2:~$ iadmin mkresc cacheResc unixfilesystem \ `hostname`:/var/lib/irods/S3CacheVault
# create an s3 resource
admin@ec2:~$ iadmin mkresc archiveResc s3 `hostname`:xsede15/`hostname -s`/Vault \ "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/.s3_auth;\
S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=3"
# create a compound resource
admin@ec2:~$ iadmin mkresc s3Resc compound
# add the cache and archive resources as children of the compound resource
admin@ec2:~$ iadmin addchildtoresc s3Resc cacheResc cache
admin@ec2:~$ iadmin addchildtoresc s3Resc archiveResc archive
Let's Try It: Building a Resource Hierarchy
# create a replication resource
admin@ec2:~$ iadmin mkresc replResc replication
# add demoResc and s3Resc as children of the replication resource
admin@ec2:~$ iadmin addchildtoresc replResc demoResc
admin@ec2:~$ iadmin addchildtoresc replResc s3Resc
# rebalance, to force an initial sync of the resources
admin@ec2:~$ iadmin modresc replResc rebalance
# edit /etc/irods/core.re
admin@ec2:~$ sudo nano /etc/irods/core.re
...
acSetRescSchemeForCreate {msiSetDefaultResc("replResc","null"); }
...
Change this from "demoResc"
/etc/irods/core.re
Let's Try It: Building a Resource Hierarchy
# edit ~irods/.irods/irods_environment.json
admin@ec2:~$ sudo nano ~irods/.irods/irods_environment.json
... "irods_default_resource": "replResc", ...
Change this from "demoResc"
~irods/.irods/irods_environment.json
Test It!
admin@ec2:~$ ilsresc
replResc:replication
├── demoResc
└── s3Resc:compound
├── archiveResc:s3
└── cacheResc
admin@ec2:~$ ils -L
/zone223/home/admin:
admin 0 replResc;s3Resc;cacheResc 1128069 2015-07-28.04:40 & beans.jpg
generic /var/lib/irods/S3CacheVault/home/admin/beans.jpg
admin 1 replResc;s3Resc;archiveResc 1128069 2015-07-28.04:40 & beans.jpg
generic xsede15/ec2-52-3-93-223/Vault/home/admin/beans.jpg
admin 2 replResc;demoResc 1128069 2015-07-28.04:40 & beans.jpg
generic /var/lib/irods/Vault/home/admin/beans.jpg
Extra Credit: Federation
Secure Collaboration between iRODS Zones
• Share files between zones
• Local authentication (passwords stay in zone)
• Admins tell each zone about one another
• Admins exchange two keys: zone_key & negotiation key
Tell Each Zone about the Others
# if you're not already logged in
you@laptop:~$ ssh admin@<ip address>
# tell your server about your neighbor's zone
admin@ec2:~$ iadmin mkzone <remote zone name> remote <remote hostname>:1247
# create a user account for your neighbor
admin@ec2:~$ iadmin mkuser admin#<remote zone name> rodsuser
# give your neighbor read access to your home collection (don't do this in real life)
admin@ec2:~$ ichmod -r read admin#<remote zone name> /<local zone name>/home/admin
# edit /etc/irods/server_config.json
admin@ec2:~$ sudo nano /etc/irods/server_config.json
Make sure you do this on both machines!
/etc/irods/server_config.json
...
"federation": [
{
"icat_host": "<hostname of your neighbor>",
"zone_name": "<zone name of your neighbor>",
"zone_key": "0123456789abcdef",
"negotiation_key": "abcdefghijklmnopqrstuvwxyzabcdef"
}
],
...
Add
this
section
(to both
machines)
Try it out!
Questions?
Thank you!
Dan Bedard
danb@renci.org
+1-919-445-0632
XSEDE15: Introduction to iRODS
By beppodb
XSEDE15: Introduction to iRODS
- 2,449