Composable Resources

June 7-9, 2016

iRODS User Group Meeting 2016

Chapel Hill, NC

Jason M. Coposky

@jason_coposky

Interim Executive Director

Overview of Resource Composition

Uses a well known Tree Metaphor - Branches and Leaves

 

 

Two types of nodes:

  • Coordinating Resource -

​               pure decision making

  • Storage Resource -
            instance managing the hardware

 

 

 

By convention, Coordinating Resources do not have storage

( this is not enforced)

Coordinating Resources - Branches

Compound - provide POSIX interface to alternative storage

Deferred - defer to children regarding voting behavior

Load Balanced - use gathered load values to determine choices

Passthru - weight, then delegate operations to a child resource

Random - randomly choose a child for a write operation

Replication - ensure all data objects are consistent across children

Round Robin - delegate writes to each child in series

Storage Resources - Leaves

Non-Cached

    Unix File System - generic file system storage

    Ceph-RADOS - Ceph object storage

    HPSS - access to IBM High Performance Storage System

 

Cached (Archive)

    S3 - archive resource for Amazon S3

    WOS - DDN Web Object Scalar

    Universal MSS - script based access to generic archive storage

Compound Resources

Necessary for POSIX compliance - disk cache in front of Object, Tape, etc.

 

For a Put

  • Data objects are delegated to the cache first and registered
  • Then a copy (replica) is sent to the archive

 

For a Get

  • Replica is staged to cache if required
  • Read always happens from cache

Building a Compound Resource

iadmin mkresc comp_resc compound
iadmin mkresc ufs_cache unixfilesystem `hostname`:/tmp/irods/ufs_cache
iadmin mkresc ufs_arch unixfilesystem `hostname`:/tmp/irods/ufs_arch

iadmin mkresc resc_name resc_type context_string

iadmin addchildtoresc parent_name child_name context

iadmin addchildtoresc comp_resc ufs_cache cache
iadmin addchildtoresc comp_resc ufs_arch archive

Review Compound Resource Configuration

irods@example:~$ ilsresc
comp_resc:compound
├── ufs_arch
└── ufs_cache
demoResc
irods@example:~$ ilsresc -l comp_resc
resource name: comp_resc
id: 10001
zone: tempZone
type: compound
class: cache
location: EMPTY_RESC_HOST
vault: EMPTY_RESC_PATH
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739292: 2016-05-31.20:01:32
modify time: 01464739292: 2016-05-31.20:01:32
context:
parent:
parent context:

Review the child resources

irods@example:~$ ilsresc -l ufs_cache
resource name: ufs_cache
id: 10017
zone: tempZone
type: unixfilesystem
class: cache
location: example
vault: /tmp/irods/ufs_cache
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739293: 2016-05-31.20:01:33
modify time: 01464739302: 2016-05-31.20:01:42
context:
parent: 10001
parent context: cache
irods@example:~$ ilsresc -l ufs_arch
resource name: ufs_arch
id: 10018
zone: tempZone
type: unixfilesystem
class: cache
location: example
vault: /tmp/irods/ufs_arch
free space:
free space time: : Never
status:
info:
comment:
create time: 01464739295: 2016-05-31.20:01:35
modify time: 01464739310: 2016-05-31.20:01:50
context:
parent: 10001
parent context: archive

Test Put

irods@example:~$ truncate --size 10M test_file
irods@example:~$ ls -l test_file
-rw-rw-r-- 1 irods irods 10485760 May 31 21:11 test_file


irods@example:~$ iput -R comp_resc test_file
irods@example:~$ ils -l
/tempZone/home/rods:
  rods        0 comp_resc;ufs_cache   10485760 2016-05-31.21:11 & test_file
  rods        1 comp_resc;ufs_arch    10485760 2016-05-31.21:11 & test_file

By default, the archive is immediately replicated after the cache replica is at rest and registered in the catalog

 

ufs_cache has replica number 0 (written first)

ufs_arch has replica number 1 (written second)

Delayed Replication to an Archive Resource

Set the compound resource context string to "auto_repl=off"

iadmin modresc comp_resc context "auto_repl=off"

Leverage the rule engine for replication via acPostProcForPut

acPostProcForPut() {

    if("ufs_cache" == $rescName) {

        delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") {

            *CacheRescName = "comp_resc;ufs_cache";

            msisync_to_archive("*CacheRescName", $filePath, $objPath);

        }

    }

}

Excellent examples of delayed replication and cache purging can be found here:

https://github.com/trel/irods-compound-resource/blob/master/rules/SaraRules.re

Prepping for delayed replication

{
    "instance_name": "re-irods-instance",     
    "plugin_name": "re-irods",      
    "plugin_specific_configuration": {     
        "re_rulebase_set": [
            {                  
                "filename": "training"         
            },
            {
                "filename": "core"         
            }
        ]
    }
}

Add a custom rulebase to /etc/irods/server_config.json

Create a new Rulebase

Edit /etc/irods/training.re and add our new Policy Enforcement Point

acPostProcForPut() {

    if("ufs_cache" == $rescName) {

        delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") {

            *CacheRescName = "comp_resc;ufs_cache";

            msisync_to_archive("*CacheRescName", $filePath, $objPath);

        }

    }

}

Test Put, Delayed

irods@example:~$ iput -R comp_resc test_file test_file2 ; ils -l
/tempZone/home/rods:
  rods              0 comp_resc;ufs_cache     10485760 2016-05-31.22:03 & test_file
  rods              1 comp_resc;ufs_arch     10485760 2016-05-31.22:03 & test_file
  rods              0 comp_resc;ufs_cache     10485760 2016-05-31.22:05 & test_file2
irods@example:~$ ils -l
/tempZone/home/rods:
  rods              0 comp_resc;ufs_cache     10485760 2016-05-31.22:03 & test_file
  rods              1 comp_resc;ufs_arch     10485760 2016-05-31.22:03 & test_file
  rods              0 comp_resc;ufs_cache     10485760 2016-05-31.22:05 & test_file2
  rods              1 comp_resc;ufs_arch     10485760 2016-05-31.22:05 & test_file2

Wait for it...

Questions?

A Storage Balanced Resource Composition

Goal: Use dynamic policy enforcement point to influence voting behavior

iadmin mkresc def_resc deferred
iadmin mkresc ufs1 unixfilesystem `hostname`:/tmp/ufs1
iadmin mkresc ufs2 unixfilesystem `hostname`:/tmp/ufs2
iadmin mkresc pt1 passthru
iadmin mkresc pt2 passthru
iadmin addchildtoresc def_resc pt1
iadmin addchildtoresc def_resc pt2
iadmin addchildtoresc pt1 ufs1
iadmin addchildtoresc pt2 ufs2
irods@example:~$ ilsresc
comp_resc:compound
└── ufs_arch
└── ufs_cache
def_resc:deferred
├── pt1:passthru
│   └── ufs1
└── pt2:passthru
    └── ufs2

Influencing Voting Behavior

  • Deferred node will simply pick the highest vote

  • Leverage a dynamic policy enforcement point to influence the voting behavior of the passthru node

    • The passthru node will honor weights set by the result string

    • These will override weights that may also be in the context string

Force all new Puts to the PT2 Resource

pep_resource_resolve_hierarchy_pre(
  *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){
    if( "CREATE" == *OP_TYPE ) {
        if( "pt1" == *INST_NAME) {
            *OUT = "read=1.0;write=0.5"
        }
        else if ( "pt2" == *INST_NAME ) {
            *OUT = "read=1.0;write=1.0"
        }
    }
}

Add to /etc/irods/training.re

Test the forced vote weights

irods@example:~$ iput -R def_resc test_file weight_test1
irods@example:~$ iput -R def_resc test_file weight_test2

irods@example:~$ ils -l
/tempZone/home/rods:
  rods              0 comp_resc;ufs_cache   10485760 2016-06-01.09:33 & test_file
  rods              1 comp_resc;ufs_arch    10485760 2016-06-01.09:33 & test_file
  rods              0 comp_resc;ufs_cache   10485760 2016-06-01.09:40 & test_file2
  rods              1 comp_resc;ufs_arch    10485760 2016-06-01.09:40 & test_file2
  rods              0 def_resc;pt2;ufs2     10485760 2016-06-01.09:59 & weight_test1
  rods              0 def_resc;pt2;ufs2     10485760 2016-06-01.09:59 & weight_test2

Questions?

Building a Storage Balanced Resource

Edit /etc/irods/training.re

 

Overload the pep_resolve_resource_hierarchy_pre by

adding a new implementation preceding the previous example

(order matters)

Building a Storage Balanced Resource

pep_resource_resolve_hierarchy_pre(
  *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){
    # only influence CREATE operations
    if( "CREATE" == *OP_TYPE ) {
        foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) {
            *RESC_TYPE = *ROW.RESC_TYPE_NAME;
        }
        if( "passthru" == *RESC_TYPE ) {
            *HYP_BYTES_USED = double(*CTX.file_size);

            # add up bytes used by all of the resource's children
            foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) {
                *INST_ID = int(*ROW.RESC_ID);
            }
            foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) {
                *STORAGE_RESC = *ROW1.RESC_NAME;
                foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) {
                    *HYP_BYTES_USED = *HYP_BYTES_USED + double(*ROW2.DATA_SIZE);
                }
            }

            # if no max_bytes context string, assume infinite capacity
            # 0 is a do-not-write
            *MAX_BYTES = -1;
            foreach(*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME'){
                *CONTEXT_STRING = *ROW.RESC_CONTEXT;
            }
            foreach( *KVP_STRING in split( *CONTEXT_STRING, ";" ) ) {
                *KVP = split( *KVP_STRING, "=" );
                if( "max_bytes" == elem( *KVP, 0 )) {
                    *MAX_BYTES = double(elem(*KVP,1));
                }
            }

            # compute percent full
            if( -1 == *MAX_BYTES ) {
                *HYP_PERCENT_FULL = 0;
            } else if( 0 == *MAX_BYTES ) {
                *HYP_PERCENT_FULL = 1;
            } else {
                *HYP_PERCENT_FULL = *HYP_BYTES_USED / *MAX_BYTES;
            }
            *WRITE_WEIGHT = 1 - *HYP_PERCENT_FULL;
            *WEIGHT_STRING = "read=1.0;write=*WRITE_WEIGHT";
            *OUT = *WEIGHT_STRING;

        } # if( "passthru"
    } # if( "CREATE"
} # pep

Building a Storage Balanced Resource - 1/2

pep_resource_resolve_hierarchy_pre(
  *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){
    # only influence CREATE operations
    if( "CREATE" == *OP_TYPE ) {foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) {
            *RESC_TYPE = *ROW.RESC_TYPE_NAME;
        }
        if( "passthru" == *RESC_TYPE ) {
            *HYP_BYTES_USED = double(*CTX.file_size);

            # add up bytes used by all of the resource's children
            foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) {
                *INST_ID = int(*ROW.RESC_ID);
            }
            foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) {
                *STORAGE_RESC = *ROW1.RESC_NAME;
                foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) {
                    *HYP_BYTES_USED = *HYP_BYTES_USED + double(*ROW2.DATA_SIZE);
                }
            }

Building a Storage Balanced Resource - 2/2

          # if no max_bytes context string, assume infinite capacity
          # 0 is a do-not-write
          *MAX_BYTES = -1;
          foreach(*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME'){
            *CONTEXT_STRING = *ROW.RESC_CONTEXT;
          }
          foreach( *KVP_STRING in split( *CONTEXT_STRING, ";" ) ) {
            *KVP = split( *KVP_STRING, "=" );
            if( "max_bytes" == elem( *KVP, 0 )) {
              *MAX_BYTES = double(elem(*KVP,1));
            }
          }

          # compute percent full
          if( -1 == *MAX_BYTES ) {
              *HYP_PERCENT_FULL = 0;
          } else if( 0 == *MAX_BYTES ) {
              *HYP_PERCENT_FULL = 1;
          } else {
              *HYP_PERCENT_FULL = *HYP_BYTES_USED / *MAX_BYTES;
          }
          *WRITE_WEIGHT = 1 - *HYP_PERCENT_FULL;
          *WEIGHT_STRING = "read=1.0;write=*WRITE_WEIGHT";
          *OUT = *WEIGHT_STRING;

      } # if( "passthru"
   } # if( "CREATE"
} # pep

Testing the Storage Balancing Resource

Put some files and check the distribution:

Set the max_bytes for each passthru node:

iadmin modresc pt1 context "max_bytes=20000000"
iadmin modresc pt2 context "max_bytes=20000000"
irods@example:~$ irm -f weight_test1 weight_test2
irods@example:~$ iput -R def_resc VERSION.json f1
irods@example:~$ iput -R def_resc VERSION.json f2
irods@example:~$ iput -R def_resc VERSION.json f3
irods@example:~$ iput -R def_resc VERSION.json f4
irods@example:~$ ils -l
/tempZone/home/rods:
  rods              0 def_resc;pt2;ufs2          290 2016-06-01.12:33 & f1
  rods              0 def_resc;pt1;ufs1          290 2016-06-01.12:33 & f2
  rods              0 def_resc;pt2;ufs2          290 2016-06-01.12:33 & f3
  rods              0 def_resc;pt1;ufs1          290 2016-06-01.12:33 & f4

...

Questions?