Advanced Training:

Resource Hierarchies

and Composition

June 13-16, 2023

iRODS User Group Meeting 2023

Chapel Hill, NC

Alan King, Senior Software Developer

Justin James, Senior Applications Engineer

iRODS Consortium

iRODS Resource Plugins

  • Internally defines the interface to all storage technologies
  • Loaded dynamically at runtime
  • Uses a vote to advertise the ability to satisfy a given operation
  • Maintains individual configuration per instance in the catalog via a 'context string'
  • May exist independently or can be wired together into hierarchies

Motivation

Many iRODS users spent considerable time implementing the same basic use cases as policy in their rule base

 

  • Replication
  • Data distribution
  • Replica synchronization
  • Data archival

 

Resource hierarchies provide an out of the box means to implement the majority of the use cases, while remaining future-proof

Introduction to Resource Hierarchies

Capture the implementation of various policies as nodes in a decision tree, a well known metaphor

 

Two types of nodes:

  • Coordinating - pure policy implementation
  • Storage Resource - manages the interface to storage technologies

By convention Coordinating Resources do not manage storage

  - this is not enforced

Coordinating Resources - Branches

Compound - provide POSIX interface to alternative storage

Deferred - defer to children regarding voting behavior

Load Balanced - use gathered load values to determine choices

Passthru - weight, then delegate operations to a child resource

Random - randomly choose a child for a write operation

Replication - ensure all data objects are consistent across children

 

Purely virtual in memory - they are not pinned to any given server and plugins must exist on every server in the grid

Storage Resources - Leaves

Storage resources which provide POSIX semantics

   - do not require a compound resource hierarchy and a cache

 

Unix File System - surfaces any mount point

Cacheless S3 - resource for S3-compatible storage service

 

Usually pinned to a given server by hostname, as they are expected to access storage provided by the server - plugins do not need to be available on every server.


The cacheless S3 resource and Unix File System resources implement a detached mode, which is not pinned to a server.

Archive Storage Resources - Leaves

Storage Resources which participate in the role as an Archive Resource in a Compound Resource Composition

 

S3 - archive resource for S3-compatible storage

Universal MSS - script based access to generic mass storage (tape)

 

Must be pinned to the servers which host the cache in order to synchronize data to the archive resource

The Voting Mechanism

  • Votes are by convention between 0.0 and 1.0
  • All communication starts at the root of a hierarchy
    • addressing children is disallowed
  • Voting follows a depth-first recursive descent
    • coordinating nodes delegate votes to children
  • Storage Nodes (Leaves) vote
  • Coordinating Nodes (Branches) interpret the results of its children
    • The interpretation of these votes expresses the policy encoded in Coordinating Resource plugins

Voting - Weighted Passthru

For both Put and Get operations

  • Delegate the vote to the child
  • Multiply the result by the weight
  • Pass the result to the calling resource

 

This resource has many uses, such as disabling writes or reads to a given resource, or providing an abstraction to the users

 

The weights also may be overridden by the rule engine which allows for dynamically influencing votes based on policy

Voting - Unix File System Resource

For a Put operation

  • If the resource is marked Down, vote 0.0
  • If the client is connected to the server which owns the target resource, vote 1.0
  • Otherwise vote 0.5

Voting - Unix File System Resource

For a Get operation

  • If the resource is marked Down, vote 0.0
  • If the client is connected to the server which owns the target resource, and the resource has an up-to-date replica, vote 1.0
  • If the resource has an up-to-date replica, vote 0.5
  • If the resource has a stale replica, vote 0.25
  • Otherwise, vote 0.0

Voting - Random Resource

For a Put Operation

  • Randomly select a child and delegate the vote to the child
  • Continue until a positive non-zero vote is achieved or all children are queried
  • Pass the result up to the calling resource

Voting - Random Resource

For a Get Operation

  • Delegate the vote to all children
  • Select the highest vote
  • Pass the result up to the calling resource

Voting - Replication Resource

For a Put Operation

  • Delegate voting to all children
  • Select highest vote
  • Pass result to calling resource
  • Once Put is complete, trigger replication to all other children that accept a Put Operation

Voting - Replication Resource

For a Get Operation

  • Delegate vote to all children
  • Select the highest vote
  • Pass result to calling resource

 

Note that given the behavior of the unixfilesystem, locality of reference significantly affects the behavior of reads and writes for this Resource

Voting - A Put Example

pt1 - passthru

pt2 - passthru

pt3 - passthru

rnd1 - random

rnd2 - random

repl1 - replication

ufsN

ufs0

Voting - A Put Example

Votes 1.0 Connected

Votes 0.5

Not Connected

Votes 0.0 Marked Down

Randomly chooses ufs0 for vote of 1.0

Passes ufsN-1 vote of 0.5

Write weight of 0.25, passes ufs0 as a vote of 0.25

Randomly chooses ufsN-1 vote of 0.5

Chooses ufsN-1 with a vote of 0.5 > 0.25

Passes ufsN-1

Compound Resources

Necessary for POSIX compliance of Tape, Object, etc.

 

For a Put

  • Data objects are written to the cache first
  • Then a replica is made to the archive

 

For a Get

  • Replica is staged to cache, if required
  • Reads are always made from the cache

 

By default, the compound resource

synchronously replicates to the archive

 

Compound Resources - An S3 Example

iadmin mkresc resc_name type context_string

iadmin mkresc comp_resc compound
iadmin mkresc cache_resc unixfilesystem $(hostname):/tmp/cache_resc

iadmin addchildtoresc parent_name child_name parent_child_context_string

iadmin addchildtoresc comp_resc cache_resc cache

The compound resource plugin honors two parent-child context values: "cache", and "archive"

 

These are used internally to identify the resources by role

Compound Resources - An S3 Example

iadmin mkresc arch_resc s3 $(hostname):/bucket/name <context_string>

  • S3_DEFAULT_HOSTNAME - used to define regions for your S3 bucket
  • S3_AUTH_FILE - absolute path to file holding the access id and access key for the given bucket
  • S3_RETRY_COUNT - number of retries before failure
  • S3_WAIT_TIME_SEC - wait time before retries in seconds
  • S3_PROTO - use either HTTP or HTTPS

iadmin mkresc arch_resc s3 $(hostname):/bucket/name "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/etc/irods/auth_file;S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=1;S3_PROTO=HTTP"

iadmin addchildtoresc comp_resc arch_resc archive

Add the archive resource to the compound resource

Building a Compound Resource

iadmin mkresc comp_resc compound
iadmin mkresc ufs_cache unixfilesystem $(hostname):/tmp/irods/ufs_cache
iadmin mkresc ufs_arch unixfilesystem $(hostname):/tmp/irods/ufs_arch

iadmin mkresc resc_name resc_type context_string

iadmin addchildtoresc parent_name child_name context

iadmin addchildtoresc comp_resc ufs_cache cache
iadmin addchildtoresc comp_resc ufs_arch archive

Review Compound Resource Configuration

irods@example:~$ ilsresc
comp_resc:compound
├── ufs_arch
└── ufs_cache
demoResc
irods@example:~$ ilsresc -l comp_resc
resource name: comp_resc
id: 10001
zone: tempZone
type: compound
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space:
free space time: : Never
status:
info:
comment:
create time: 01685984596: 2023-06-05.17:03:16
modify time: 01685984596: 2023-06-05.17:03:16
context:
parent:
parent context:

Review the child resources

irods@example:~$ ilsresc -l ufs_cache
resource name: ufs_cache
id: 10017
zone: tempZone
type: unixfilesystem
location: example
vault: /tmp/irods/ufs_cache
free space:
free space time: : Never
status:
info:
comment:
create time: 01685984596: 2023-06-05.17:03:16
modify time: 01685984602: 2023-06-05.17:03:22
context:
parent: 10001
parent context: cache
irods@example:~$ ilsresc -l ufs_arch
resource name: ufs_arch
id: 10018
zone: tempZone
type: unixfilesystem
location: example
vault: /tmp/irods/ufs_arch
free space:
free space time: : Never
status:
info:
comment:
create time: 01685984597: 2023-06-05.17:03:17
modify time: 01685984602: 2023-06-05.17:03:22
context:
parent: 10001
parent context: archive

Test Put

irods@example:~$ truncate --size 10M test_file
irods@example:~$ ls -l test_file
-rw-rw-r-- 1 irods irods 10485760 Jun  5 17:05 test_file


irods@example:~$ iput -R comp_resc test_file
irods@example:~$ ils -l
/tempZone/home/rods:
  rods        0 comp_resc;ufs_cache   10485760 2023-06-05.18:18 & test_file
  rods        1 comp_resc;ufs_arch    10485760 2023-06-05.18:18 & test_file

By default, the archive is immediately replicated after the cache replica is at rest and registered in the catalog

 

ufs_cache has replica number 0 (written first)

ufs_arch has replica number 1 (written second)

Delayed Replication to an Archive Resource

Set the compound resource context string to "auto_repl=off"

iadmin modresc comp_resc context "auto_repl=off"

Leverage the rule engine for replication via pep_api_data_obj_put_post

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
    *cache_resc_hier = "comp_resc;ufs_cache";
    *resc_hier = *DATAOBJINP.resc_hier;
    if("*cache_resc_hier" == "*resc_hier") {
        delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") {
            *unused_param = "";
            *obj_path = *DATAOBJINP.obj_path;
            msisync_to_archive("*cache_resc_hier", "*unused_param", "*obj_path");
        }
    }
}

Prepping for delayed replication

"rule_engines": [
    {
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {
            "re_data_variable_mapping_set": [
                "core"
            ],
            "re_function_name_mapping_set": [
                "core"
            ],
            "re_rulebase_set": [
                "training",
                "core"
            ],

Add a custom rulebase to /etc/irods/server_config.json

Create a new Rulebase

Edit /etc/irods/training.re and add our new Policy Enforcement Point

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
    *cache_resc_hier = "comp_resc;ufs_cache";
    *resc_hier = *DATAOBJINP.resc_hier;
    if("*cache_resc_hier" == "*resc_hier") {
        delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") {
            *unused_param = "";
            *obj_path = *DATAOBJINP.obj_path;
            msisync_to_archive("*cache_resc_hier", "*unused_param", "*obj_path");
        }
    }
}

Test Put, Delayed

irods@example:~$ iput -R comp_resc test_file test_file2 ; ils -l
/tempZone/home/rods:
  rods              0 comp_resc;ufs_cache     10485760 2023-06-05.18:18 & test_file
  rods              1 comp_resc;ufs_arch     10485760 2023-06-05.18:18 & test_file
  rods              0 comp_resc;ufs_cache     10485760 2023-06-05.18:25 & test_file2
irods@example:~$ ils -l
/tempZone/home/rods:
  rods              0 comp_resc;ufs_cache     10485760 2023-06-05.18:18 & test_file
  rods              1 comp_resc;ufs_arch     10485760 2023-06-05.18:18 & test_file
  rods              0 comp_resc;ufs_cache     10485760 2023-06-05.18:25 & test_file2
  rods              1 comp_resc;ufs_arch     10485760 2023-06-05.18:25 & test_file2

Wait for it...

Questions?