Advanced Training:
Resource Hierarchies
and Composition
June 13-16, 2023
iRODS User Group Meeting 2023
Chapel Hill, NC
Alan King, Senior Software Developer
Justin James, Senior Applications Engineer
iRODS Consortium
iRODS Resource Plugins
- Internally defines the interface to all storage technologies
- Loaded dynamically at runtime
- Uses a vote to advertise the ability to satisfy a given operation
- Maintains individual configuration per instance in the catalog via a 'context string'
- May exist independently or can be wired together into hierarchies
Motivation
Many iRODS users spent considerable time implementing the same basic use cases as policy in their rule base
- Replication
- Data distribution
- Replica synchronization
- Data archival
Resource hierarchies provide an out of the box means to implement the majority of the use cases, while remaining future-proof
Introduction to Resource Hierarchies
Capture the implementation of various policies as nodes in a decision tree, a well known metaphor
Two types of nodes:
- Coordinating - pure policy implementation
- Storage Resource - manages the interface to storage technologies
By convention Coordinating Resources do not manage storage
- this is not enforced
Coordinating Resources - Branches
Compound - provide POSIX interface to alternative storage
Deferred - defer to children regarding voting behavior
Load Balanced - use gathered load values to determine choices
Passthru - weight, then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Purely virtual in memory - they are not pinned to any given server and plugins must exist on every server in the grid
Storage Resources - Leaves
Storage resources which provide POSIX semantics
- do not require a compound resource hierarchy and a cache
Unix File System - surfaces any mount point
Cacheless S3 - resource for S3-compatible storage service
Usually pinned to a given server by hostname, as they are expected to access storage provided by the server - plugins do not need to be available on every server.
The cacheless S3 resource and Unix File System resources implement a detached mode, which is not pinned to a server.
Archive Storage Resources - Leaves
Storage Resources which participate in the role as an Archive Resource in a Compound Resource Composition
S3 - archive resource for S3-compatible storage
Universal MSS - script based access to generic mass storage (tape)
Must be pinned to the servers which host the cache in order to synchronize data to the archive resource
The Voting Mechanism
- Votes are by convention between 0.0 and 1.0
- All communication starts at the root of a hierarchy
- addressing children is disallowed
- Voting follows a depth-first recursive descent
- coordinating nodes delegate votes to children
- Storage Nodes (Leaves) vote
- Coordinating Nodes (Branches) interpret the results of its children
- The interpretation of these votes expresses the policy encoded in Coordinating Resource plugins
Voting - Weighted Passthru
For both Put and Get operations
- Delegate the vote to the child
- Multiply the result by the weight
- Pass the result to the calling resource
This resource has many uses, such as disabling writes or reads to a given resource, or providing an abstraction to the users
The weights also may be overridden by the rule engine which allows for dynamically influencing votes based on policy
Voting - Unix File System Resource
For a Put operation
- If the resource is marked Down, vote 0.0
- If the client is connected to the server which owns the target resource, vote 1.0
- Otherwise vote 0.5
Voting - Unix File System Resource
For a Get operation
- If the resource is marked Down, vote 0.0
- If the client is connected to the server which owns the target resource, and the resource has an up-to-date replica, vote 1.0
- If the resource has an up-to-date replica, vote 0.5
- If the resource has a stale replica, vote 0.25
- Otherwise, vote 0.0
Voting - Random Resource
For a Put Operation
- Randomly select a child and delegate the vote to the child
- Continue until a positive non-zero vote is achieved or all children are queried
- Pass the result up to the calling resource
Voting - Random Resource
For a Get Operation
- Delegate the vote to all children
- Select the highest vote
- Pass the result up to the calling resource
Voting - Replication Resource
For a Put Operation
- Delegate voting to all children
- Select highest vote
- Pass result to calling resource
- Once Put is complete, trigger replication to all other children that accept a Put Operation
Voting - Replication Resource
For a Get Operation
- Delegate vote to all children
- Select the highest vote
- Pass result to calling resource
Note that given the behavior of the unixfilesystem, locality of reference significantly affects the behavior of reads and writes for this Resource
Voting - A Put Example
pt1 - passthru
pt2 - passthru
pt3 - passthru
rnd1 - random
rnd2 - random
repl1 - replication
ufsN
ufs0
Voting - A Put Example
Votes 1.0 Connected
Votes 0.5
Not Connected
Votes 0.0 Marked Down
Randomly chooses ufs0 for vote of 1.0
Passes ufsN-1 vote of 0.5
Write weight of 0.25, passes ufs0 as a vote of 0.25
Randomly chooses ufsN-1 vote of 0.5
Chooses ufsN-1 with a vote of 0.5 > 0.25
Passes ufsN-1
Compound Resources
Necessary for POSIX compliance of Tape, Object, etc.
For a Put
- Data objects are written to the cache first
- Then a replica is made to the archive
For a Get
- Replica is staged to cache, if required
- Reads are always made from the cache
By default, the compound resource
synchronously replicates to the archive
Compound Resources - An S3 Example
iadmin mkresc resc_name type context_string
iadmin mkresc comp_resc compound iadmin mkresc cache_resc unixfilesystem $(hostname):/tmp/cache_resc
iadmin addchildtoresc parent_name child_name parent_child_context_string
iadmin addchildtoresc comp_resc cache_resc cache
The compound resource plugin honors two parent-child context values: "cache", and "archive"
These are used internally to identify the resources by role
Compound Resources - An S3 Example
iadmin mkresc arch_resc s3 $(hostname):/bucket/name <context_string>
- S3_DEFAULT_HOSTNAME - used to define regions for your S3 bucket
- S3_AUTH_FILE - absolute path to file holding the access id and access key for the given bucket
- S3_RETRY_COUNT - number of retries before failure
- S3_WAIT_TIME_SEC - wait time before retries in seconds
- S3_PROTO - use either HTTP or HTTPS
iadmin mkresc arch_resc s3 $(hostname):/bucket/name "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/etc/irods/auth_file;S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=1;S3_PROTO=HTTP"
iadmin addchildtoresc comp_resc arch_resc archive
Add the archive resource to the compound resource
Building a Compound Resource
iadmin mkresc comp_resc compound
iadmin mkresc ufs_cache unixfilesystem $(hostname):/tmp/irods/ufs_cache
iadmin mkresc ufs_arch unixfilesystem $(hostname):/tmp/irods/ufs_arch
iadmin mkresc resc_name resc_type context_string
iadmin addchildtoresc parent_name child_name context
iadmin addchildtoresc comp_resc ufs_cache cache
iadmin addchildtoresc comp_resc ufs_arch archive
Review Compound Resource Configuration
irods@example:~$ ilsresc
comp_resc:compound
├── ufs_arch
└── ufs_cache
demoResc
irods@example:~$ ilsresc -l comp_resc resource name: comp_resc id: 10001 zone: tempZone type: compound location: EMPTY_RESC_HOST vault: EMPTY_RESC_PATH free space: free space time: : Never status: info: comment: create time: 01685984596: 2023-06-05.17:03:16 modify time: 01685984596: 2023-06-05.17:03:16 context: parent: parent context:
Review the child resources
irods@example:~$ ilsresc -l ufs_cache resource name: ufs_cache id: 10017 zone: tempZone type: unixfilesystem location: example vault: /tmp/irods/ufs_cache free space: free space time: : Never status: info: comment: create time: 01685984596: 2023-06-05.17:03:16 modify time: 01685984602: 2023-06-05.17:03:22 context: parent: 10001 parent context: cache
irods@example:~$ ilsresc -l ufs_arch resource name: ufs_arch id: 10018 zone: tempZone type: unixfilesystem location: example vault: /tmp/irods/ufs_arch free space: free space time: : Never status: info: comment: create time: 01685984597: 2023-06-05.17:03:17 modify time: 01685984602: 2023-06-05.17:03:22 context: parent: 10001 parent context: archive
Test Put
irods@example:~$ truncate --size 10M test_file
irods@example:~$ ls -l test_file
-rw-rw-r-- 1 irods irods 10485760 Jun 5 17:05 test_file
irods@example:~$ iput -R comp_resc test_file
irods@example:~$ ils -l
/tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:18 & test_file
rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:18 & test_file
By default, the archive is immediately replicated after the cache replica is at rest and registered in the catalog
ufs_cache has replica number 0 (written first)
ufs_arch has replica number 1 (written second)
Delayed Replication to an Archive Resource
Set the compound resource context string to "auto_repl=off"
iadmin modresc comp_resc context "auto_repl=off"
Leverage the rule engine for replication via pep_api_data_obj_put_post
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) { *cache_resc_hier = "comp_resc;ufs_cache"; *resc_hier = *DATAOBJINP.resc_hier; if("*cache_resc_hier" == "*resc_hier") { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *unused_param = ""; *obj_path = *DATAOBJINP.obj_path; msisync_to_archive("*cache_resc_hier", "*unused_param", "*obj_path"); } } }
Prepping for delayed replication
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"training",
"core"
],
Add a custom rulebase to /etc/irods/server_config.json
Create a new Rulebase
Edit /etc/irods/training.re and add our new Policy Enforcement Point
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) { *cache_resc_hier = "comp_resc;ufs_cache"; *resc_hier = *DATAOBJINP.resc_hier; if("*cache_resc_hier" == "*resc_hier") { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *unused_param = ""; *obj_path = *DATAOBJINP.obj_path; msisync_to_archive("*cache_resc_hier", "*unused_param", "*obj_path"); } } }
Test Put, Delayed
irods@example:~$ iput -R comp_resc test_file test_file2 ; ils -l /tempZone/home/rods: rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:18 & test_file rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:18 & test_file rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:25 & test_file2
irods@example:~$ ils -l /tempZone/home/rods: rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:18 & test_file rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:18 & test_file rods 0 comp_resc;ufs_cache 10485760 2023-06-05.18:25 & test_file2 rods 1 comp_resc;ufs_arch 10485760 2023-06-05.18:25 & test_file2
Wait for it...
Questions?
UGM 2023 - Resource Hierarchies and Composition
By iRODS Consortium
UGM 2023 - Resource Hierarchies and Composition
iRODS User Group Meeting 2023 - Advanced Training Module
- 417