Composable Resources
June 13-15, 2017
iRODS User Group Meeting 2017
Utrecht, Netherlands
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
Composable Resources
Overview of Resource Composition
Uses a well known Tree Metaphor - Branches and Leaves
Two types of nodes:
pure decision making
By convention, Coordinating Resources do not have storage
(this is not enforced)
Coordinating Resources - Branches
Compound - provide POSIX interface to alternative storage
Deferred - defer to children regarding voting behavior
Load Balanced - use gathered load values to determine choices
Passthru - weight, then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Round Robin - delegate writes to each child in series
Storage Resources - Leaves
Non-Cached
Unix File System - generic file system storage
Ceph-RADOS - Ceph object storage
HPSS - access to IBM High Performance Storage System
Cached (Archive)
S3 - archive resource for Amazon S3
WOS - DDN Web Object Scalar
Universal MSS - script based access to generic archive storage
Compound Resources
Necessary for POSIX compliance - disk cache in front of Object, Tape, etc.
For a Put
For a Get
Building a Compound Resource
iadmin mkresc comp_resc compound iadmin mkresc ufs_cache unixfilesystem `hostname`:/tmp/irods/ufs_cache iadmin mkresc ufs_arch unixfilesystem `hostname`:/tmp/irods/ufs_arch
iadmin mkresc resc_name resc_type context_string
iadmin addchildtoresc parent_name child_name context
iadmin addchildtoresc comp_resc ufs_cache cache
iadmin addchildtoresc comp_resc ufs_arch archive
Review Compound Resource Configuration
irods@example:~$ ilsresc
comp_resc:compound
├── ufs_arch
└── ufs_cache
demoResc
irods@example:~$ ilsresc -l comp_resc
resource name: comp_resc
id: 10001
zone: tempZone
type: compound
class: cache
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739292: 2017-05-31.20:01:32
modify time: 01464739292: 2017-05-31.20:01:32
context:
parent:
parent context:
Review the child resources
irods@example:~$ ilsresc -l ufs_cache
resource name: ufs_cache
id: 10017
zone: tempZone
type: unixfilesystem
class: cache
location: example
vault: /tmp/irods/ufs_cache
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739293: 2017-05-31.20:01:33
modify time: 01464739302: 2017-05-31.20:01:42
context:
parent: 10001
parent context: cache
irods@example:~$ ilsresc -l ufs_arch
resource name: ufs_arch
id: 10018
zone: tempZone
type: unixfilesystem
class: cache
location: example
vault: /tmp/irods/ufs_arch
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739295: 2017-05-31.20:01:35
modify time: 01464739310: 2017-05-31.20:01:50
context:
parent: 10001
parent context: archive
Test Put
irods@example:~$ truncate --size 10M test_file
irods@example:~$ ls -l test_file
-rw-rw-r-- 1 irods irods 10485760 May 31 21:11 test_file
irods@example:~$ iput -R comp_resc test_file
irods@example:~$ ils -l
/tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2017-05-31.21:11 & test_file
rods 1 comp_resc;ufs_arch 10485760 2017-05-31.21:11 & test_file
By default, the archive is immediately replicated after the cache replica is at rest and registered in the catalog
ufs_cache has replica number 0 (written first)
ufs_arch has replica number 1 (written second)
Delayed Replication to an Archive Resource
Set the compound resource context string to "auto_repl=off"
iadmin modresc comp_resc context "auto_repl=off"
Leverage the rule engine for replication via acPostProcForPut
acPostProcForPut() { if("ufs_cache" == $KVPairs.rescName) { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *CacheRescName = "comp_resc;ufs_cache"; msisync_to_archive("*CacheRescName", $filePath, $objPath); } } }
Excellent examples of delayed replication and cache purging can be found here:
https://github.com/trel/irods-compound-resource/blob/master/rules/SaraRules.re
Prepping for delayed replication
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"training",
"core"
],
Add a custom rulebase to /etc/irods/server_config.json
Create a new Rulebase
Edit /etc/irods/training.re and add our new Policy Enforcement Point
acPostProcForPut() { if("ufs_cache" == $KVPairs.rescName) { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *CacheRescName = "comp_resc;ufs_cache"; msisync_to_archive("*CacheRescName", $filePath, $objPath); } } }
Test Put, Delayed
irods@example:~$ iput -R comp_resc test_file test_file2 ; ils -l
/tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2017-05-31.22:03 & test_file
rods 1 comp_resc;ufs_arch 10485760 2017-05-31.22:03 & test_file
rods 0 comp_resc;ufs_cache 10485760 2017-05-31.22:05 & test_file2
irods@example:~$ ils -l /tempZone/home/rods: rods 0 comp_resc;ufs_cache 10485760 2017-05-31.22:03 & test_file rods 1 comp_resc;ufs_arch 10485760 2017-05-31.22:03 & test_file rods 0 comp_resc;ufs_cache 10485760 2017-05-31.22:05 & test_file2 rods 1 comp_resc;ufs_arch 10485760 2017-05-31.22:05 & test_file2
Wait for it...
Questions?
A Storage Balanced Resource Composition
Goal: Use dynamic policy enforcement point to influence voting behavior
iadmin mkresc def_resc deferred iadmin mkresc ufs1 unixfilesystem `hostname`:/tmp/ufs1 iadmin mkresc ufs2 unixfilesystem `hostname`:/tmp/ufs2 iadmin mkresc pt1 passthru iadmin mkresc pt2 passthru iadmin addchildtoresc def_resc pt1 iadmin addchildtoresc def_resc pt2 iadmin addchildtoresc pt1 ufs1 iadmin addchildtoresc pt2 ufs2
irods@example:~$ ilsresc comp_resc:compound
└── ufs_arch
└── ufs_cache
def_resc:deferred
├── pt1:passthru
│ └── ufs1
└── pt2:passthru
└── ufs2
Influencing Voting Behavior
Deferred node will simply pick the highest vote
Leverage a dynamic policy enforcement point to influence the voting behavior of the passthru node
The passthru node will honor weights set by the result string
These will override weights that may also be in the context string
Force all new Puts to the PT2 Resource
pep_resource_resolve_hierarchy_pre( *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){ if( "CREATE" == *OP_TYPE ) { if( "pt1" == *INST_NAME) { *OUT = "read=1.0;write=0.5" } else if ( "pt2" == *INST_NAME ) { *OUT = "read=1.0;write=1.0" } } }
Add to /etc/irods/training.re
Test the forced vote weights
irods@example:~$ iput -R def_resc test_file weight_test1 irods@example:~$ iput -R def_resc test_file weight_test2 irods@example:~$ ils -l /tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2017-06-01.09:33 & test_file
rods 1 comp_resc;ufs_arch 10485760 2017-06-01.09:33 & test_file
rods 0 comp_resc;ufs_cache 10485760 2017-06-01.09:40 & test_file2
rods 1 comp_resc;ufs_arch 10485760 2017-06-01.09:40 & test_file2
rods 0 def_resc;pt2;ufs2 10485760 2017-06-01.09:59 & weight_test1
rods 0 def_resc;pt2;ufs2 10485760 2017-06-01.09:59 & weight_test2
Questions?
Building a Storage Balanced Resource
Edit /etc/irods/training.re
Overload the pep_resolve_resource_hierarchy_pre by
adding a new implementation preceding the previous example
(order matters)
Building a Storage Balanced Resource
pep_resource_resolve_hierarchy_pre( *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){ # only influence CREATE operations if( "CREATE" == *OP_TYPE ) { foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) { *RESC_TYPE = *ROW.RESC_TYPE_NAME; } if( "passthru" == *RESC_TYPE ) { *HYP_BYTES_USED = double(*CTX.file_size); # add up bytes used by all of the resource's children foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) { *INST_ID = int(*ROW.RESC_ID); } foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) { *STORAGE_RESC = *ROW1.RESC_NAME; foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) { *HYP_BYTES_USED = *HYP_BYTES_USED + double(*ROW2.DATA_SIZE); } } # if no max_bytes context string, assume infinite capacity # 0 is a do-not-write *MAX_BYTES = -1; foreach(*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME'){ *CONTEXT_STRING = *ROW.RESC_CONTEXT; } foreach( *KVP_STRING in split( *CONTEXT_STRING, ";" ) ) { *KVP = split( *KVP_STRING, "=" ); if( "max_bytes" == elem( *KVP, 0 )) { *MAX_BYTES = double(elem(*KVP,1)); } } # compute percent full if( -1 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 0; } else if( 0 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 1; } else { *HYP_PERCENT_FULL = *HYP_BYTES_USED / *MAX_BYTES; } *WRITE_WEIGHT = 1 - *HYP_PERCENT_FULL; *WEIGHT_STRING = "read=1.0;write=*WRITE_WEIGHT"; *OUT = *WEIGHT_STRING; } # if( "passthru" } # if( "CREATE" } # pep
Building a Storage Balanced Resource - 1/2
pep_resource_resolve_hierarchy_pre( *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){ # only influence CREATE operations if( "CREATE" == *OP_TYPE ) { foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) { *RESC_TYPE = *ROW.RESC_TYPE_NAME; } if( "passthru" == *RESC_TYPE ) { *HYP_BYTES_USED = double(*CTX.file_size); # add up bytes used by all of the resource's children foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) { *INST_ID = int(*ROW.RESC_ID); } foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) { *STORAGE_RESC = *ROW1.RESC_NAME; foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) { *HYP_BYTES_USED = *HYP_BYTES_USED + double(*ROW2.DATA_SIZE); } }
Building a Storage Balanced Resource - 2/2
# if no max_bytes context string, assume infinite capacity # 0 is a do-not-write *MAX_BYTES = -1; foreach(*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME'){ *CONTEXT_STRING = *ROW.RESC_CONTEXT; } foreach( *KVP_STRING in split( *CONTEXT_STRING, ";" ) ) { *KVP = split( *KVP_STRING, "=" ); if( "max_bytes" == elem( *KVP, 0 )) { *MAX_BYTES = double(elem(*KVP,1)); } } # compute percent full if( -1 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 0; } else if( 0 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 1; } else { *HYP_PERCENT_FULL = *HYP_BYTES_USED / *MAX_BYTES; } *WRITE_WEIGHT = 1 - *HYP_PERCENT_FULL; *WEIGHT_STRING = "read=1.0;write=*WRITE_WEIGHT"; *OUT = *WEIGHT_STRING; } # if( "passthru" } # if( "CREATE" } # pep
Testing the Storage Balancing Resource
Put some files and check the distribution:
Set the max_bytes for each passthru node:
iadmin modresc pt1 context "max_bytes=20000000"
iadmin modresc pt2 context "max_bytes=20000000"
irods@example:~$ irm -f weight_test1 weight_test2 irods@example:~$ iput -R def_resc VERSION.json f1 irods@example:~$ iput -R def_resc VERSION.json f2 irods@example:~$ iput -R def_resc VERSION.json f3 irods@example:~$ iput -R def_resc VERSION.json f4 irods@example:~$ ils -l /tempZone/home/rods: rods 0 def_resc;pt2;ufs2 224 2017-06-03.12:33 & f1 rods 0 def_resc;pt1;ufs1 224 2017-06-03.12:33 & f2 rods 0 def_resc;pt2;ufs2 224 2017-06-03.12:33 & f3 rods 0 def_resc;pt1;ufs1 224 2017-06-03.12:33 & f4
...
Questions?