Advanced Training:
Resource Hierarchies
and Composition
May 28-31, 2024
iRODS User Group Meeting 2024
Amsterdam, Netherlands
Alan King, Senior Software Developer
Martin Flores, Software Developer
iRODS Consortium
iRODS Resource Plugins
Motivation
Many iRODS users spent considerable time implementing the same basic use cases as policy in their rule base
Resource hierarchies provide an out of the box means to implement the majority of the use cases, while remaining future-proof
Introduction to Resource Hierarchies
Capture the implementation of various policies as nodes in a decision tree, a well known metaphor
Two types of nodes:
By convention Coordinating Resources do not manage storage
- this is not enforced
Coordinating Resources - Branches
Compound - provide POSIX interface to alternative storage
Deferred - defer to children regarding voting behavior
Load Balanced - use gathered load values to determine choices
Passthru - weight, then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Purely virtual in memory - they are not pinned to any given server and plugins must exist on every server in the grid
Storage Resources - Leaves
Storage resources which provide POSIX semantics
- do not require a compound resource hierarchy and a cache
Unix File System - surfaces any mount point
Cacheless S3 - resource for S3-compatible storage service
Usually pinned to a given server by hostname, as they are expected to access storage provided by the server - plugins do not need to be available on every server.
The cacheless S3 resource and Unix File System resources implement a detached mode, which is not pinned to a server.
Archive Storage Resources - Leaves
Storage Resources which participate in the role as an Archive Resource in a Compound Resource Composition
S3 - archive resource for S3-compatible storage
Universal MSS - script based access to generic mass storage (tape)
Must be pinned to the servers which host the cache in order to synchronize data to the archive resource
The Voting Mechanism
Voting - Weighted Passthru
For both Put and Get operations
This resource has many uses, such as disabling writes or reads to a given resource, or providing an abstraction to the users
The weights also may be overridden by the rule engine which allows for dynamically influencing votes based on policy
Voting - Unix File System Resource
For a Put operation
Voting - Unix File System Resource
For a Get operation
Voting - Random Resource
For a Put Operation
Voting - Random Resource
For a Get Operation
Voting - Replication Resource
For a Put Operation
Voting - Replication Resource
For a Get Operation
Note that given the behavior of the unixfilesystem, locality of reference significantly affects the behavior of reads and writes for this Resource
Voting - A Put Example
pt1 - passthru
pt2 - passthru
pt3 - passthru
rnd1 - random
rnd2 - random
repl1 - replication
ufsN
ufs0
Voting - A Put Example
Votes 1.0 Connected
Votes 0.5
Not Connected
Votes 0.0 Marked Down
Randomly chooses ufs0 for vote of 1.0
Passes ufsN-1 vote of 0.5
Write weight of 0.25, passes ufs0 as a vote of 0.25
Randomly chooses ufsN-1 vote of 0.5
Chooses ufsN-1 with a vote of 0.5 > 0.25
Passes ufsN-1
Compound Resources
Necessary for POSIX compliance of Tape, Object, etc.
For a Put
For a Get
By default, the compound resource
synchronously replicates to the archive
Compound Resources - An S3 Example
iadmin mkresc resc_name type context_string
iadmin mkresc comp_resc compound iadmin mkresc cache_resc unixfilesystem $(hostname):/tmp/cache_resc
iadmin addchildtoresc parent_name child_name parent_child_context_string
iadmin addchildtoresc comp_resc cache_resc cache
The compound resource plugin honors two parent-child context values: "cache", and "archive"
These are used internally to identify the resources by role
THESE ARE JUST EXAMPLES - DO NOT COPY/PASTE
Compound Resources - An S3 Example
iadmin mkresc arch_resc s3 $(hostname):/bucket/name <context_string>
iadmin mkresc arch_resc s3 $(hostname):/bucket/name "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/etc/irods/auth_file;S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=1;S3_PROTO=HTTP"
iadmin addchildtoresc comp_resc arch_resc archive
Add the archive resource to the compound resource
THESE ARE JUST EXAMPLES - DO NOT COPY/PASTE
Building a Compound Resource
iadmin mkresc comp_resc compound
iadmin mkresc ufs_cache unixfilesystem $(hostname):/tmp/irods/ufs_cache
iadmin mkresc ufs_arch unixfilesystem $(hostname):/tmp/irods/ufs_arch
iadmin addchildtoresc parent_name child_name context
iadmin addchildtoresc comp_resc ufs_cache cache
iadmin addchildtoresc comp_resc ufs_arch archive
iadmin mkresc resc_name resc_type context_string
switch to the irods user
sudo su - irods
Review Compound Resource Configuration
irods@example:~$ ilsresc comp_resc:compound ├── ufs_arch:unixfilesystem └── ufs_cache:unixfilesystem demoResc:unixfilesystem
irods@example:~$ ilsresc -l comp_resc resource name: comp_resc id: 10001 zone: tempZone type: compound location: EMPTY_RESC_HOST vault: EMPTY_RESC_PATH free space: free space time: : Never status: info: comment: create time: 01714928596: 2024-05-05.17:03:16 modify time: 01714928596: 2024-05-05.17:03:16 context: parent: parent context:
Review the child resources
irods@example:~$ ilsresc -l ufs_cache resource name: ufs_cache id: 10017 zone: tempZone type: unixfilesystem location: example vault: /tmp/irods/ufs_cache free space: free space time: : Never status: info: comment: create time: 01714928596: 2024-05-05.17:03:16 modify time: 01714928602: 2024-05-05.17:03:22 context: parent: 10001 parent context: cache
irods@example:~$ ilsresc -l ufs_arch
resource name: ufs_arch
id: 10018
zone: tempZone
type: unixfilesystem
location: example
vault: /tmp/irods/ufs_arch
free space:
free space time: : Never
status:
info:
comment:
create time: 01714928597: 2024-05-05.17:03:17
modify time: 01714928602: 2024-05-05.17:03:22
context:
parent: 10001
parent context: archive
Test Put
irods@example:~$ truncate --size 10M test_file
irods@example:~$ ls -l test_file
-rw-rw-r-- 1 irods irods 10485760 May 5 17:05 test_file
irods@example:~$ iput -R comp_resc test_file
irods@example:~$ ils -l
/tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2024-05-05.18:18 & test_file
rods 1 comp_resc;ufs_arch 10485760 2024-05-05.18:18 & test_file
By default, the archive is immediately replicated after the cache replica is at rest and registered in the catalog
ufs_cache has replica number 0 (written first)
ufs_arch has replica number 1 (written second)
Delayed Replication to an Archive Resource
Set the compound resource context string to "auto_repl=off"
iadmin modresc comp_resc context "auto_repl=off"
And prepare to leverage the rule engine for replication
via pep_api_data_obj_put_post()
Create a new Rulebase
Edit /etc/irods/training.re and add our new Policy Enforcement Point
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) { *cache_resc_hier = "comp_resc;ufs_cache"; *resc_hier = *DATAOBJINP.resc_hier; if("*cache_resc_hier" == "*resc_hier") { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *unused_param = ""; *obj_path = *DATAOBJINP.obj_path; msisync_to_archive("*cache_resc_hier", "*unused_param", "*obj_path"); } } }
Prepping for delayed replication
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"training",
"core"
],
Add the custom rulebase to /etc/irods/server_config.json
Test Put, Delayed
irods@example:~$ iput -R comp_resc test_file test_file2 ; ils -l /tempZone/home/rods: rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:18 & test_file rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:18 & test_file rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:25 & test_file2 irods@example:~$ iqstat id name 10023 *unused_param = ""; *obj_path = *DATAOBJINP.obj_path; msisync_to_archive("*cache_resc_hier", "*unused_param", "*obj_path");
irods@example:~$ ils -l /tempZone/home/rods: rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:18 & test_file rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:18 & test_file rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:25 & test_file2 rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:25 & test_file2
Wait for it...
Questions?