January 14-16, 2020


Montpellier, France

Jason Coposky


Executive Director, iRODS Consortium

Resource Hierarchies

and Composition

Resource Hierarchies

and Composition

iRODS Resource Plugins

  • Internally defines the interface to all storage technologies
  • Loaded dynamically at runtime
  • Uses a vote to advertise the ability to satisfy a given operation
  • Maintains individual configuration per instance in the catalog via a 'context string'
  • May exist independently or can be wired together into hierarchies


Many iRODS users spent considerable time implementing the same basic use cases as policy in their rule base


  • Replication
  • Data distribution
  • Replica synchronization
  • Data archival


Resource hierarchies provide an out of the box means to implement the majority of the use cases, while remaining future-proof

Introduction to Resource Hierarchies

Capture the implementation of various policies as nodes in a decision tree, a well known metaphor


Two types of nodes:

  • Coordinating - pure policy implementation
  • Storage Resource - manages the interface to storage technologies

By convention Coordinating Resources do not manage storage

  - this is not enforced

Coordinating Resources - Branches

Compound - provide POSIX interface to alternative storage

Deferred - defer to children regarding voting behavior

Load Balanced - use gathered load values to determine choices

Passthru - weight, then delegate operations to a child resource

Random - randomly choose a child for a write operation

Replication - ensure all data objects are consistent across children


Purely virtual in memory - they are not pinned to any given server and plugins must exist on every server in the grid

Storage Resources - Leaves

Storage resources which provide POSIX semantics

   - do not require a compound resource hierarchy and a cache


Unix File System - surfaces any mount point

Ceph-RADOS - Ceph object storage

HPSS - access to IBM High Performance Storage System

Cacheless S3 - resource for Amazon S3 service (soon to be released)


Usually pinned to a given server by hostname, as they are expected to access storage provided by the server - plugins do not need to be available on every server.

Exception:  The soon to be released cacheless S3 resource implements a detached mode which is not pinned to a server.

Archive Storage Resources - Leaves

Storage Resources which participate in the role as an Archive Resource in a Compound Resource Composition


S3 - archive resource for Amazon S3 

WOS - DDN Web Object Scalar

Universal MSS - script based access to generic object storage


Must be pinned to the servers which host the cache in order to synchronize data to the archive resource

The Voting Mechanism

  • Votes are by convention between 0.0 and 1.0
  • All communication starts at the root of a hierarchy
    • addressing children is disallowed
  • Voting follows a depth-first recursive descent
    • coordinating nodes delegate votes to children
  • Storage Nodes (Leaves) vote
  • Coordinating Nodes (Branches) interpret the results of its children
    • The interpretation of these votes expresses the policy encoded in Coordinating Resource plugins

Voting - Weighted Passthru

For both Put and Get operations

  • Delegate the vote to the child
  • Multiply the result by the weight
  • Pass the result to the calling resource


This resource has many uses, such as disabling writes or reads to a given resource, or providing an abstraction to the users


The weights also may be overridden by the rule engine which allows for dynamically influencing votes based on policy

Voting - Unix File System Resource

For a Put operation

  • If the resource is marked Down, vote 0.0
  • If the client is connected to the server which owns the target resource, vote 1.0
  • Otherwise vote 0.5

Voting - Unix File System Resource

For a Get operation

  • If the resource is marked Down, vote 0.0
  • If the client is connected to the server which owns the target resource, and the resource has an up-to-date replica, vote 1.0
  • If the resource has an up-to-date replica, vote 0.5
  • If the resource has a stale replica, vote 0.25
  • Otherwise, vote 0.0

Voting - Random Resource

For a Put Operation

  • Randomly select a child and delegate the vote to the child, continue until a positive non-zero vote is achieved or all children are queried
  • Pass the result up to the calling resource

Voting - Random Resource

For a Get Operation

  • Delegate the vote to all children
  • Select the highest vote
  • Pass the result up to the calling resource

Voting - Replication Resource

For a Put Operation

  • Delegate voting to all children
  • Select highest vote
  • Pass result to calling resource
  • Once Put is complete, trigger replication to all other children who accept a Put Operation

Voting - Replication Resource

For a Get Operation

  • Delegate vote to all children
  • Select the highest vote
  • Pass result to calling resource


Note that given the behavior of the Unix File System, locality of reference significantly affects the behavior of reads and writes for this Resource

Voting - A Put Example

pt1 - passthru

pt2 - passthru

pt3 - passthru

rnd1 - random

rnd2 - random

repl1 - replication



Voting - A Put Example

Votes 1.0 Connected

Votes 0.5

Not Connected

Votes 0.0 Marked Down

Randomly chooses ufs0 for vote of 1.0

Passes ufsN-1 vote of 0.5

Write weight of 0.25, passes ufs0 as a vote of 0.25

Randomly chooses ufsN-1 vote of 0.5

Chooses ufsN-1 with a vote of 0.5 > 0.25

Passes ufsN-1

Compound Resources

Necessary for POSIX compliance of Tape, Object, etc.


For a Put

  • Data objects are written to the cache first
  • Then a replica is made to the archive


For a Get

  • Replica is staged to cache, if required
  • Reads always are made from the cache


By default, the compound resource

synchronously replicates to the archive


Compound Resources - An S3 Example

iadmin mkresc resc_name type context_string

iadmin mkresc comp_resc compound
iadmin mkresc cache_resc unixfilesystem `hostname`:/tmp/cache_resc

iadmin addchildtoresc parent_name child_name parent_child_context_string

iadmin addchildtoresc comp_resc cache_resc cache

The compound resource plugin honors two parent-child context values: "cache", and "archive"


These are used internally to identify the resources by role

Compound Resources - An S3 Example

iadmin mkresc arch_resc s3 `hostname`:/bucket/name <context_string>

  • S3_DEFAULT_HOSTNAME - used to define regions for your S3 bucket
  • S3_AUTH_FILE - fully qualified path to a file holding the access id and the access key for the given bucket name
  • S3_RETRY_COUNT - number of retries before failure
  • S3_WAIT_TIME_SEC - wait time before retries in seconds
  • S3_PROTO - use either HTTP or HTTPS

iadmin mkresc arch_resc s3 `hostname`:/bucket/name "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/etc/irods/auth_file;S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=1;S3_PROTO=HTTP"

iadmin addchildtoresc comp_resc arch_resc archive

Add the archive resource to the compound resource

Compound Resources - Delayed Replication

Set the compound resource context string to "auto_repl=off"

iadmin modresc comp_resc context "auto_repl=off"

Leverage the rule engine for replication via acPostProcForPut

acPostProcForPut() {
    if("ufs_cache" == $KVPairs.rescName) {
            *CacheRescName = "comp_resc;ufs_cache";
            msisync_to_archive("*CacheRescName", $filePath, $objPath);

Excellent examples of delayed replication and cache purging can be found here: