Composable Resources
June 7-9, 2016
iRODS User Group Meeting 2016
Chapel Hill, NC
Jason M. Coposky
@jason_coposky
Interim Executive Director
Overview of Resource Composition
Uses a well known Tree Metaphor - Branches and Leaves
Two types of nodes:
- Coordinating Resource -
pure decision making
-
Storage Resource -
instance managing the hardware
By convention, Coordinating Resources do not have storage
( this is not enforced)
Coordinating Resources - Branches
Compound - provide POSIX interface to alternative storage
Deferred - defer to children regarding voting behavior
Load Balanced - use gathered load values to determine choices
Passthru - weight, then delegate operations to a child resource
Random - randomly choose a child for a write operation
Replication - ensure all data objects are consistent across children
Round Robin - delegate writes to each child in series
Storage Resources - Leaves
Non-Cached
Unix File System - generic file system storage
Ceph-RADOS - Ceph object storage
HPSS - access to IBM High Performance Storage System
Cached (Archive)
S3 - archive resource for Amazon S3
WOS - DDN Web Object Scalar
Universal MSS - script based access to generic archive storage
Compound Resources
Necessary for POSIX compliance - disk cache in front of Object, Tape, etc.
For a Put
- Data objects are delegated to the cache first and registered
- Then a copy (replica) is sent to the archive
For a Get
- Replica is staged to cache if required
- Read always happens from cache
Building a Compound Resource
iadmin mkresc comp_resc compound iadmin mkresc ufs_cache unixfilesystem `hostname`:/tmp/irods/ufs_cache iadmin mkresc ufs_arch unixfilesystem `hostname`:/tmp/irods/ufs_arch
iadmin mkresc resc_name resc_type context_string
iadmin addchildtoresc parent_name child_name context
iadmin addchildtoresc comp_resc ufs_cache cache
iadmin addchildtoresc comp_resc ufs_arch archive
Review Compound Resource Configuration
irods@example:~$ ilsresc
comp_resc:compound
├── ufs_arch
└── ufs_cache
demoResc
irods@example:~$ ilsresc -l comp_resc
resource name: comp_resc
id: 10001
zone: tempZone
type: compound
class: cache
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739292: 2016-05-31.20:01:32
modify time: 01464739292: 2016-05-31.20:01:32
context:
parent:
parent context:
Review the child resources
irods@example:~$ ilsresc -l ufs_cache
resource name: ufs_cache
id: 10017
zone: tempZone
type: unixfilesystem
class: cache
location: example
vault: /tmp/irods/ufs_cache
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739293: 2016-05-31.20:01:33
modify time: 01464739302: 2016-05-31.20:01:42
context:
parent: 10001
parent context: cache
irods@example:~$ ilsresc -l ufs_arch
resource name: ufs_arch
id: 10018
zone: tempZone
type: unixfilesystem
class: cache
location: example
vault: /tmp/irods/ufs_arch
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739295: 2016-05-31.20:01:35
modify time: 01464739310: 2016-05-31.20:01:50
context:
parent: 10001
parent context: archive
Test Put
irods@example:~$ truncate --size 10M test_file
irods@example:~$ ls -l test_file
-rw-rw-r-- 1 irods irods 10485760 May 31 21:11 test_file
irods@example:~$ iput -R comp_resc test_file
irods@example:~$ ils -l
/tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2016-05-31.21:11 & test_file
rods 1 comp_resc;ufs_arch 10485760 2016-05-31.21:11 & test_file
By default, the archive is immediately replicated after the cache replica is at rest and registered in the catalog
ufs_cache has replica number 0 (written first)
ufs_arch has replica number 1 (written second)
Delayed Replication to an Archive Resource
Set the compound resource context string to "auto_repl=off"
iadmin modresc comp_resc context "auto_repl=off"
Leverage the rule engine for replication via acPostProcForPut
acPostProcForPut() { if("ufs_cache" == $rescName) { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *CacheRescName = "comp_resc;ufs_cache"; msisync_to_archive("*CacheRescName", $filePath, $objPath); } } }
Excellent examples of delayed replication and cache purging can be found here:
https://github.com/trel/irods-compound-resource/blob/master/rules/SaraRules.re
Prepping for delayed replication
{
"instance_name": "re-irods-instance",
"plugin_name": "re-irods",
"plugin_specific_configuration": {
"re_rulebase_set": [
{
"filename": "training"
},
{
"filename": "core"
}
]
}
}
Add a custom rulebase to /etc/irods/server_config.json
Create a new Rulebase
Edit /etc/irods/training.re and add our new Policy Enforcement Point
acPostProcForPut() { if("ufs_cache" == $rescName) { delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") { *CacheRescName = "comp_resc;ufs_cache"; msisync_to_archive("*CacheRescName", $filePath, $objPath); } } }
Test Put, Delayed
irods@example:~$ iput -R comp_resc test_file test_file2 ; ils -l
/tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2016-05-31.22:03 & test_file
rods 1 comp_resc;ufs_arch 10485760 2016-05-31.22:03 & test_file
rods 0 comp_resc;ufs_cache 10485760 2016-05-31.22:05 & test_file2
irods@example:~$ ils -l /tempZone/home/rods: rods 0 comp_resc;ufs_cache 10485760 2016-05-31.22:03 & test_file rods 1 comp_resc;ufs_arch 10485760 2016-05-31.22:03 & test_file rods 0 comp_resc;ufs_cache 10485760 2016-05-31.22:05 & test_file2 rods 1 comp_resc;ufs_arch 10485760 2016-05-31.22:05 & test_file2
Wait for it...
Questions?
A Storage Balanced Resource Composition
Goal: Use dynamic policy enforcement point to influence voting behavior
iadmin mkresc def_resc deferred iadmin mkresc ufs1 unixfilesystem `hostname`:/tmp/ufs1 iadmin mkresc ufs2 unixfilesystem `hostname`:/tmp/ufs2 iadmin mkresc pt1 passthru iadmin mkresc pt2 passthru iadmin addchildtoresc def_resc pt1 iadmin addchildtoresc def_resc pt2 iadmin addchildtoresc pt1 ufs1 iadmin addchildtoresc pt2 ufs2
irods@example:~$ ilsresc comp_resc:compound
└── ufs_arch
└── ufs_cache
def_resc:deferred
├── pt1:passthru
│ └── ufs1
└── pt2:passthru
└── ufs2
Influencing Voting Behavior
-
Deferred node will simply pick the highest vote
-
Leverage a dynamic policy enforcement point to influence the voting behavior of the passthru node
-
The passthru node will honor weights set by the result string
-
These will override weights that may also be in the context string
-
Force all new Puts to the PT2 Resource
pep_resource_resolve_hierarchy_pre( *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){ if( "CREATE" == *OP_TYPE ) { if( "pt1" == *INST_NAME) { *OUT = "read=1.0;write=0.5" } else if ( "pt2" == *INST_NAME ) { *OUT = "read=1.0;write=1.0" } } }
Add to /etc/irods/training.re
Test the forced vote weights
irods@example:~$ iput -R def_resc test_file weight_test1 irods@example:~$ iput -R def_resc test_file weight_test2 irods@example:~$ ils -l /tempZone/home/rods:
rods 0 comp_resc;ufs_cache 10485760 2016-06-01.09:33 & test_file
rods 1 comp_resc;ufs_arch 10485760 2016-06-01.09:33 & test_file
rods 0 comp_resc;ufs_cache 10485760 2016-06-01.09:40 & test_file2
rods 1 comp_resc;ufs_arch 10485760 2016-06-01.09:40 & test_file2
rods 0 def_resc;pt2;ufs2 10485760 2016-06-01.09:59 & weight_test1
rods 0 def_resc;pt2;ufs2 10485760 2016-06-01.09:59 & weight_test2
Questions?
Building a Storage Balanced Resource
Edit /etc/irods/training.re
Overload the pep_resolve_resource_hierarchy_pre by
adding a new implementation preceding the previous example
(order matters)
Building a Storage Balanced Resource
pep_resource_resolve_hierarchy_pre( *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){ # only influence CREATE operations if( "CREATE" == *OP_TYPE ) { foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) { *RESC_TYPE = *ROW.RESC_TYPE_NAME; } if( "passthru" == *RESC_TYPE ) { *HYP_BYTES_USED = double(*CTX.file_size); # add up bytes used by all of the resource's children foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) { *INST_ID = int(*ROW.RESC_ID); } foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) { *STORAGE_RESC = *ROW1.RESC_NAME; foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) { *HYP_BYTES_USED = *HYP_BYTES_USED + double(*ROW2.DATA_SIZE); } } # if no max_bytes context string, assume infinite capacity # 0 is a do-not-write *MAX_BYTES = -1; foreach(*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME'){ *CONTEXT_STRING = *ROW.RESC_CONTEXT; } foreach( *KVP_STRING in split( *CONTEXT_STRING, ";" ) ) { *KVP = split( *KVP_STRING, "=" ); if( "max_bytes" == elem( *KVP, 0 )) { *MAX_BYTES = double(elem(*KVP,1)); } } # compute percent full if( -1 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 0; } else if( 0 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 1; } else { *HYP_PERCENT_FULL = *HYP_BYTES_USED / *MAX_BYTES; } *WRITE_WEIGHT = 1 - *HYP_PERCENT_FULL; *WEIGHT_STRING = "read=1.0;write=*WRITE_WEIGHT"; *OUT = *WEIGHT_STRING; } # if( "passthru" } # if( "CREATE" } # pep
Building a Storage Balanced Resource - 1/2
pep_resource_resolve_hierarchy_pre( *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){ # only influence CREATE operations if( "CREATE" == *OP_TYPE ) { foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) { *RESC_TYPE = *ROW.RESC_TYPE_NAME; } if( "passthru" == *RESC_TYPE ) { *HYP_BYTES_USED = double(*CTX.file_size); # add up bytes used by all of the resource's children foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) { *INST_ID = int(*ROW.RESC_ID); } foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) { *STORAGE_RESC = *ROW1.RESC_NAME; foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) { *HYP_BYTES_USED = *HYP_BYTES_USED + double(*ROW2.DATA_SIZE); } }
Building a Storage Balanced Resource - 2/2
# if no max_bytes context string, assume infinite capacity # 0 is a do-not-write *MAX_BYTES = -1; foreach(*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME'){ *CONTEXT_STRING = *ROW.RESC_CONTEXT; } foreach( *KVP_STRING in split( *CONTEXT_STRING, ";" ) ) { *KVP = split( *KVP_STRING, "=" ); if( "max_bytes" == elem( *KVP, 0 )) { *MAX_BYTES = double(elem(*KVP,1)); } } # compute percent full if( -1 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 0; } else if( 0 == *MAX_BYTES ) { *HYP_PERCENT_FULL = 1; } else { *HYP_PERCENT_FULL = *HYP_BYTES_USED / *MAX_BYTES; } *WRITE_WEIGHT = 1 - *HYP_PERCENT_FULL; *WEIGHT_STRING = "read=1.0;write=*WRITE_WEIGHT"; *OUT = *WEIGHT_STRING; } # if( "passthru" } # if( "CREATE" } # pep
Testing the Storage Balancing Resource
Put some files and check the distribution:
Set the max_bytes for each passthru node:
iadmin modresc pt1 context "max_bytes=20000000"
iadmin modresc pt2 context "max_bytes=20000000"
irods@example:~$ irm -f weight_test1 weight_test2 irods@example:~$ iput -R def_resc VERSION.json f1 irods@example:~$ iput -R def_resc VERSION.json f2 irods@example:~$ iput -R def_resc VERSION.json f3 irods@example:~$ iput -R def_resc VERSION.json f4 irods@example:~$ ils -l /tempZone/home/rods: rods 0 def_resc;pt2;ufs2 290 2016-06-01.12:33 & f1 rods 0 def_resc;pt1;ufs1 290 2016-06-01.12:33 & f2 rods 0 def_resc;pt2;ufs2 290 2016-06-01.12:33 & f3 rods 0 def_resc;pt1;ufs1 290 2016-06-01.12:33 & f4
...
Questions?
UGM 2016 - Composable Resources
By iRODS Consortium
UGM 2016 - Composable Resources
- 3,033