June 25-27, 2019
iRODS User Group Meeting 2019
Utrecht, Netherlands
Justin James
Applications Engineer, iRODS Consortium
Resource Hierarchies
and Composition
Resource Hierarchies
and Composition
iRODS Resource Plugins
- Internally defines the interface to all storage technologies
- Loaded dynamically at runtime
- Uses a vote to advertise the ability to satisfy a given operation
- Maintains individual configuration per instance in the catalog via a 'context string'
- May exist independently or can be wired together into hierarchies
Motivation
Many iRODS users spent considerable time implementing the same basic use cases as policy in their rule base
- Replication
- Data distribution
- Replica synchronization
- Data archival
Resource hierarchies provide an out of the box means to implement the majority of the use cases, while remaining future-proof
Introduction to Resource Hierarchies
Capture the implementation of various policies as nodes in a decision tree, a well known metaphor
Two types of nodes:
- Coordinating - pure policy implementation
- Storage Resource - manages the interface to storage technologies
By convention Coordinating Resources do not manage storage
- this is not enforced
Coordinating Resources - Branches
Compound - provide POSIX interface to alternative storage
Deferred - defer to children regarding voting behavior
Load Balanced - use gathered load values to determine choices
Passthru - weight, then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Purely virtual in memory - they are not pinned to any given server and plugins must exist on every server in the grid
Storage Resources - Leaves
Storage resources which provide POSIX semantics
- do not require a compound resource hierarchy and a cache
Unix File System - surfaces any mount point
Ceph-RADOS - Ceph object storage
HPSS - access to IBM High Performance Storage System
Cacheless S3 - resource for Amazon S3 service (soon to be released)
Usually pinned to a given server by hostname, as they are expected to access storage provided by the server - plugins do not need to be available on every server.
Exception: The soon to be released cacheless S3 resource implements a detached mode which is not pinned to a server.
Archive Storage Resources - Leaves
Storage Resources which participate in the role as an Archive Resource in a Compound Resource Composition
S3 - archive resource for Amazon S3
WOS - DDN Web Object Scalar
Universal MSS - script based access to generic object storage
Must be pinned to the servers which host the cache in order to synchronize data to the archive resource
The Voting Mechanism
- Votes are by convention between 0.0 and 1.0
- All communication starts at the root of a hierarchy
- addressing children is disallowed
- Voting follows a depth-first recursive descent
- coordinating nodes delegate votes to children
- Storage Nodes (Leaves) vote
- Coordinating Nodes (Branches) interpret the results of its children
- The interpretation of these votes expresses the policy encoded in Coordinating Resource plugins
Voting - Weighted Passthru
For both Put and Get operations
- Delegate the vote to the child
- Multiply the result by the weight
- Pass the result to the calling resource
This resource has many uses, such as disabling writes or reads to a given resource, or providing an abstraction to the users
The weights also may be overridden by the rule engine which allows for dynamically influencing votes based on policy
Voting - Unix File System Resource
For a Put operation
- If the resource is marked Down, vote 0.0
- If the client is connected to the server which owns the target resource, vote 1.0
- Otherwise vote 0.5
Voting - Unix File System Resource
For a Get operation
- If the resource is marked Down, vote 0.0
- If the client is connected to the server which owns the target resource, and the resource has an up-to-date replica, vote 1.0
- If the resource has an up-to-date replica, vote 0.5
- If the resource has a stale replica, vote 0.25
- Otherwise, vote 0.0
Voting - Random Resource
For a Put Operation
- Randomly select a child and delegate the vote to the child, continue until a positive non-zero vote is achieved or all children are queried
- Pass the result up to the calling resource
Voting - Random Resource
For a Get Operation
- Delegate the vote to all children
- Select the highest vote
- Pass the result up to the calling resource
Voting - Replication Resource
For a Put Operation
- Delegate voting to all children
- Select highest vote
- Pass result to calling resource
- Once Put is complete, trigger replication to all other children who accept a Put Operation
Voting - Replication Resource
For a Get Operation
- Delegate vote to all children
- Select the highest vote
- Pass result to calling resource
Note that given the behavior of the Unix File System, locality of reference significantly affects the behavior of reads and writes for this Resource
Voting - A Put Example
pt1 - passthru
pt2 - passthru
pt3 - passthru
rnd1 - random
rnd2 - random
repl1 - replication
ufsN
ufs0
Voting - A Put Example
Votes 1.0 Connected
Votes 0.5
Not Connected
Votes 0.0 Marked Down
Randomly chooses ufs0 for vote of 1.0
Passes ufsN-1 vote of 0.5
Write weight of 0.25, passes ufs0 as a vote of 0.25
Randomly chooses ufsN-1 vote of 0.5
Chooses ufsN-1 with a vote of 0.5 > 0.25
Passes ufsN-1
Compound Resources
Necessary for POSIX compliance of Tape, Object, etc.
For a Put
- Data objects are written to the cache first
- Then a replica is made to the archive
For a Get
- Replica is staged to cache, if required
- Reads always are made from the cache
By default, the compound resource
synchronously replicates to the archive
Compound Resources - An S3 Example
iadmin mkresc resc_name type context_string
iadmin mkresc comp_resc compound iadmin mkresc cache_resc unixfilesystem `hostname`:/tmp/cache_resc
iadmin addchildtoresc parent_name child_name parent_child_context_string
iadmin addchildtoresc comp_resc cache_resc cache
The compound resource plugin honors two parent-child context values: "cache", and "archive"
These are used internally to identify the resources by role
Compound Resources - An S3 Example
iadmin mkresc arch_resc s3 `hostname`:/bucket/name <context_string>
- S3_DEFAULT_HOSTNAME - used to define regions for your S3 bucket
- S3_AUTH_FILE - fully qualified path to a file holding the access id and the access key for the given bucket name
- S3_RETRY_COUNT - number of retries before failure
- S3_WAIT_TIME_SEC - wait time before retries in seconds
- S3_PROTO - use either HTTP or HTTPS
iadmin mkresc arch_resc s3 `hostname`:/bucket/name "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/etc/irods/auth_file;S3_RETRY_COUNT=3;S3_WAIT_TIME_SEC=1;S3_PROTO=HTTP"
iadmin addchildtoresc comp_resc arch_resc archive
Add the archive resource to the compound resource
Compound Resources - Delayed Replication
Set the compound resource context string to "auto_repl=off"
iadmin modresc comp_resc context "auto_repl=off"
Leverage the rule engine for replication via acPostProcForPut
acPostProcForPut() { if("ufs_cache" == $KVPairs.rescName) { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *CacheRescName = "comp_resc;ufs_cache"; msisync_to_archive("*CacheRescName", $filePath, $objPath); } } }
Excellent examples of delayed replication and cache purging can be found here:
https://github.com/trel/irods-compound-resource/blob/master/rules/SaraRules.re
Demonstration
UGM 2019 - Resource Hierarchies and Composition
By justinkylejames
UGM 2019 - Resource Hierarchies and Composition
iRODS User Group Meeting 2019 - Advanced Training Module
- 1,303