iRODS Capabilities

Storage Tiering

February 21, 2018

Renaissance Computing Institute

UNC-Chapel Hill

Jason M. Coposky


Executive Director, iRODS Consortium

iRODS Capabilities

Storage Tiering

iRODS Capabilities

  • Capture the majority of use cases
  • Packaged and Versioned
  • Configuration, not code
  • Capabilities combine to deliver complex environments

iRODS Capabilities

Storage Tiering Overview

Installing Tiered Storage Plugin

wget -qO - | sudo apt-key add -
echo "deb [arch=amd64] $(lsb_release -sc) main" | \
  sudo tee /etc/apt/sources.list.d/renci-irods.list
sudo apt-get update

Install the package repository

sudo apt-get install irods-rule-engine-plugin-tiered-storage

Install the storage tiering package

Make some resources

iadmin mkresc rnd0 random
iadmin mkresc rnd1 random
iadmin mkresc rnd2 random
iadmin mkresc ufs0 unixfilesystem `hostname`:/tmp/irods/ufs0
iadmin mkresc ufs1 unixfilesystem `hostname`:/tmp/irods/ufs1
iadmin mkresc ufs2 unixfilesystem `hostname`:/tmp/irods/ufs2
iadmin mkresc ufs3 unixfilesystem `hostname`:/tmp/irods/ufs3
iadmin mkresc ufs4 unixfilesystem `hostname`:/tmp/irods/ufs4
iadmin mkresc ufs5 unixfilesystem `hostname`:/tmp/irods/ufs5
iadmin addchildtoresc rnd0 ufs0
iadmin addchildtoresc rnd0 ufs1
iadmin addchildtoresc rnd1 ufs2
iadmin addchildtoresc rnd1 ufs3
iadmin addchildtoresc rnd2 ufs4
iadmin addchildtoresc rnd2 ufs5

As the irods service account

Configuring the rule engine plugin

"rule_engines": [
         "instance_name": "irods_rule_engine_plugin-tiered_storage-instance",
         "plugin_name": "irods_rule_engine_plugin-tiered_storage",
         "plugin_specific_configuration": {
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {  
        "shared_memory_instance": "irods_rule_language_rule_engine"


Metadata driven Storage Tiering

"plugin_specific_configuration": {
    "access_time_attribute" : "irods::access_time",
    "storage_tiering_group_attribute" : "irods::storage_tier_group",
    "storage_tiering_time_attribute" : "irods::storage_tier_time",
    "storage_tiering_query_attribute" : "irods::storage_tier_query",
    "storage_tiering_verification_attribute" : "irods::storage_tier_verification",
    "storage_tiering_restage_delay_attribute" : "irods::storage_tier_restage_delay",
    "default_restage_delay_parameters" : "<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>",
    "time_check_string" : "TIME_CHECK_STRING"

All default metadata attributes are configurable

Should an existing vocabulary exist, it can be configured

Configuring a Tier Group

imeta add -R rnd0 irods::storage_tier_group example_group 0
imeta add -R rnd1 irods::storage_tier_group example_group 1
imeta add -R rnd2 irods::storage_tier_group example_group 2

Tier groups are entirely driven by metadata

  • the attribute identifies a group participant
  • the value defines group name
  • the unit defines the position within the group


Configuring Tiering Time Constraints

Configure the rnd0 to hold data for only 30 seconds

imeta add -R rnd0 irods::storage_tier_time 30

Then configure the rnd1 to hold data for 2 minutes

imeta add -R rnd1 irods::storage_tier_time 120

rnd2 does not have a storage tier time and holds data indefinitely

Verification of Data Migration

From least to most expensive, verification can be determined by: catalog, file system, or checksum

imeta add -R rnd0 irods::storage_tier_verification catalog

The default configuration

imeta add -R rnd0 irods::storage_tier_verification filesystem

Stat and compare file sizes of catalog and destination replica

Verification of Data Migration

imeta add -R rnd1 irods::storage_tier_verification checksum

Compute a checksum of the data at rest and compare

Should the source replica not have a checksum one will be computed before the replication is performed

Custom Violation Query

Admins may specify a custom query which identifies violating data objects

imeta set -R rnd1 irods::storage_tier_query "SELECT DATA_NAME, COLL_NAME WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_ID IN ('10021', '10022')"

Add additional custom metadata to the query to customize for project, user or any other 

Launching the sample Tiering rule

   "rule-engine-instance-name": "irods_rule_engine_plugin-tiered_storage-instance",
   "rule-engine-operation": "apply_storage_tiering_policy",
   "delay-parameters": "<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>",
   "storage-tier-groups": [
INPUT null
OUTPUT ruleExecOut
irule -r irods_rule_engine_plugin-tiered_storage-instance -F example_tiering_invocation.r 

JSON ingested by the Tiering plugin - run once until success or six failures


Testing Tiered Storage - MungeFS

Mung or munge is computer jargon for a series of potentially destructive or irrevocable changes to a piece of data or a file.

 Raymond, Eric S. "The Jargon File, version 4.4.8". Archived from the original on June 15, 2015. Retrieved 15 June 2015.

A FUSE filesystem overlay to an underlying file system, controlled by Avro/ZeroMQ messages which order it to misbehave.


  • Random errno style failures
  • Provide silent false file size
  • Provide silent corrupt reads
  • Provide silent corrupt writes

We required a way to provide various reliable failure modes for the purposes of testing

Mounted as a FUSE volume then exposed as an iRODS resource

Initial Use Case

Testing verification for Tiered Storage

  • Necessary to test verification failure, not only success
  • Verification failure is only possible to test after a 'successful' replication

Configuring MungeFS

mungefs /tmp/irods/mnt  -omodules=subdir,subdir=/tmp/irods/target

Creating the mount point

fusermount -u /tmp/irods/mnt

Decommissioning the mount point

Configuring MungeFS

mungefsctl --operations "getattr" --corrupt_size

Report invalid file size

mungefsctl --operations "read" --corrupt_data

Corrupt data on a read

mungefsctl --operations "write" --corrupt_data

Corrupt data on a write