Advanced Training:

Storage Tiering

May 28-31, 2024

iRODS User Group Meeting 2024

Amsterdam, Netherlands

Alan King, Senior Software Developer

Martin Flores, Software Developer

iRODS Consortium

iRODS Capabilities

  • Packaged and supported solutions
  • Require configuration not code
  • Derived from the majority of use cases observed in the user community

Storage Tiering Overview

Data Object Access Time

The default policy for tiering is based on the last time of access for a given data object which is applied as metadata.

pep_api_data_obj_close_post
pep_api_data_obj_put_post
pep_api_data_obj_get_post
pep_api_phy_path_reg_post
...
irods::access_time <unix_timestamp>

Dynamic Policy Enforcement Points for RPC API are used to apply the metadata (list is not exhaustive).

Configuring a Tier Group

imeta set -R <resc0> irods::storage_tiering::group example_group 0
imeta set -R <resc1> irods::storage_tiering::group example_group 1
imeta set -R <resc2> irods::storage_tiering::group example_group 2

Tier groups are entirely driven by metadata.

  • The attribute identifies the resource as a tiering group participant
  • The value defines the group name
  • The unit defines the position within the group
  • Tier position, or index, can be any value - order will be honored
  • Configuration must be performed at the root of a resource composition
  • A resource may belong to many tiering groups

Configuring Tiering Time Constraints

Tiering violation time is configured in seconds.

imeta set -R <resource> irods::storage_tiering::time 2592000

The final tier in a group does not have a storage tiering time

  • It will hold data indefinitely
imeta set -R <resource> irods::storage_tiering::time 30

Configure a tier to hold data for 30 seconds.

Configure a tier to hold data for 30 days.

Verification of Data Migration

When data is found to be in violation ...

  • Data object is replicated to the next tier
  • New replica integrity is verified (in one of three ways)
  • Source replica is trimmed

 

catalog is the default verification for all resources.

imeta set -R <resource> irods::storage_tiering::verification catalog

For verification, this setting will determine if the replica is properly registered within the catalog after replication.

Verification of Data Migration

imeta add -R <resource> irods::storage_tiering::verification filesystem

This option will stat the remote replica on disk and compare the file size with that of the catalog.

filesystem verification is more expensive as it involves a potentially remote filesystem stat.

Verification of Data Migration

imeta add -R <resource> irods::storage_tiering::verification checksum

checksum verification is the most expensive as file sizes may be large.

This option will compute a checksum of the data once it is at rest, and compare it with the value in the catalog.

 

Should the source replica not have a checksum, one will be computed before the replication is performed.

Configuring the restage resource

imeta add -R <resource> irods::storage_tiering::minimum_restage_tier true

When data is in a tier other than the lowest tier, upon access the data is restaged back to the lowest tier.

 

This flag identifies the lowest tier for restage.

Users may not want data restaged back to the lowest tier, should that tier be very remote or not appropriate for analysis.

 

Consider a storage resource at the edge serving as a landing zone for instrument data.

Preserving Replicas

Some users may not wish to trim a replica from a tier when data is migrated, such as to allow data to be archived and also still be available on fast storage.

 

 

To preserve a replica on any given tier, attach the following metadata flag to the root resource.

imeta set -R <resource> irods::storage_tiering::preserve_replicas true

Custom Violating Query - GenQuery

Admins may specify a custom query which identifies violating data objects.

 

Here is an example demonstrating the use of GenQuery.

imeta set -R <resource> irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_ID IN ('10021', '10022')"

Any number of queries may be attached to a resource in order to provide a range of criteria by which violating data may be identified.

  • Could include user applied metadata
  • Could include externally harvested metadata

Custom Violating Query - Specific Query

More complex SQL may be required to identify violating objects. Users may configure Specific Queries and attach those to a given tier within a group. 

Create a specific query in SQL.

Configure the specific query.

iadmin asq "SELECT DISTINCT t0.data_name, t1.coll_name, pdu.user_name, pdu.zone_name, t0.data_repl_num FROM R_DATA_MAIN t0 INNER JOIN R_COLL_MAIN t1 ON t1.coll_id = t0.coll_id INNER JOIN R_RESC_MAIN t2 ON t0.resc_id = t2.resc_id INNER JOIN R_OBJT_ACCESS pdoa ON t0.data_id = pdoa.object_id INNER JOIN R_TOKN_MAIN pdt ON pdoa.access_type_id = pdt.token_id INNER JOIN R_USER_MAIN pdu ON pdoa.user_id = pdu.user_id INNER JOIN R_OBJT_METAMAP ommd ON t0.data_id = ommd.object_id INNER JOIN R_META_MAIN mmd ON ommd.meta_id = mmd.meta_id WHERE t2.resc_name IN ('st_ufs2', 'st_ufs3') AND mmd.meta_attr_name = 'archive_object' AND mmd.meta_attr_value = 'true' AND pdoa.access_type_id >= '1120';" archive_query
imeta set -R <resource> irods::storage_tiering::query archive_query specific

Limiting violating query results

When working with large sets of data, throttling the amount of data migrated at one time can be helpful.

 

In order to limit the results of the violating queries attach the following metadata attribute with the value set as the query limit.

imeta set -R <resource> irods::storage_tiering::object_limit <limit_value>

Logging data transfer

In order to record the transfer of data objects from one tier to the next, the storage tiering plugin on the catalog provider can be configured in the plugin_specific_configuration stanza.

 

In /etc/irods/server_config.json add the configuration to the storage tiering plugin instance.

{
    "instance_name": "irods_rule_engine_plugin-unified_storage_tiering-instance",
    "plugin_name": "irods_rule_engine_plugin-unified_storage_tiering",
    "plugin_specific_configuration": {
        "data_transfer_log_level" : "LOG_NOTICE"
    }
},

Storage Tiering Metadata Vocabulary

"plugin_specific_configuration": {
    "access_time_attribute": "irods::access_time",
    "group_attribute": "irods::storage_tiering::group",
    "time_attribute": "irods::storage_tiering::time",
    "query_attribute": "irods::storage_tiering::query",
    "verification_attribute": "irods::storage_tiering::verification",
    "data_movement_parameters_attribute": "irods::storage_tiering::restage_delay",
    "minimum_restage_tier": "irods::storage_tiering::minimum_restage_tier",
    "preserve_replicas": "irods::storage_tiering::preserve_replicas",
    "object_limit": "irods::storage_tiering::object_limit",
    "default_data_movement_parameters": "<EF>60s REPEAT UNTIL SUCCESS OR 5 TIMES</EF>",
    "minumum_delay_time": "irods::storage_tiering::minimum_delay_time_in_seconds",
    "maximum_delay_time": "irods::storage_tiering::maximum_delay_time_in_seconds",
    "time_check_string": "TIME_CHECK_STRING",
    "data_transfer_log_level": "LOG_DEBUG"
}

All default metadata attributes are configurable.

Should there be a preexisting vocabulary in your organization, it can be leveraged by redefining the metadata attributes used by the storage tiering framework.

Example Implementation

Getting Started

Installing Tiered Storage Plugin

wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add -
echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | \
   sudo tee /etc/apt/sources.list.d/renci-irods.list
sudo apt-get update

As the ubuntu user ...

Install the package repository.

sudo apt-get install irods-rule-engine-plugin-unified-storage-tiering

Install the storage tiering package.

wget https://raw.githubusercontent.com/irods/irods_training/main/stickers.jpg -P /tmp

And get some stickers.

Configuring the rule engine plugin

"rule_engines": [
    {
         "instance_name": "irods_rule_engine_plugin-unified_storage_tiering-instance",
         "plugin_name": "irods_rule_engine_plugin-unified_storage_tiering",
         "plugin_specific_configuration": {
         }
    },
    {   
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {  
            <snip>
        },
        "shared_memory_instance": "irods_rule_language_rule_engine"
    },
    ...
]

As the irods user, edit /etc/irods/server_config.json.

Note: Make sure the unified storage tiering plugin is the only rule engine plugin listed above  the irods_rule_language plugin.

Example Implementation

Three Tier Group with Random Resources

Make some resources

iadmin mkresc rnd0 random
iadmin mkresc rnd1 random
iadmin mkresc rnd2 random
iadmin mkresc st_ufs0 unixfilesystem $(hostname):/tmp/irods/st_ufs0
iadmin mkresc st_ufs1 unixfilesystem $(hostname):/tmp/irods/st_ufs1
iadmin mkresc st_ufs2 unixfilesystem $(hostname):/tmp/irods/st_ufs2
iadmin mkresc st_ufs3 unixfilesystem $(hostname):/tmp/irods/st_ufs3
iadmin mkresc st_ufs4 unixfilesystem $(hostname):/tmp/irods/st_ufs4
iadmin mkresc st_ufs5 unixfilesystem $(hostname):/tmp/irods/st_ufs5
iadmin addchildtoresc rnd0 st_ufs0
iadmin addchildtoresc rnd0 st_ufs1
iadmin addchildtoresc rnd1 st_ufs2
iadmin addchildtoresc rnd1 st_ufs3
iadmin addchildtoresc rnd2 st_ufs4
iadmin addchildtoresc rnd2 st_ufs5

As the irods user ...

Make some resources

$ ilsresc
demoResc:unixfilesystem
rnd0:random
├── st_ufs0:unixfilesystem
└── st_ufs1:unixfilesystem
rnd1:random
├── st_ufs2:unixfilesystem
└── st_ufs3:unixfilesystem
rnd2:random
├── st_ufs4:unixfilesystem
└── st_ufs5:unixfilesystem

Check the results.

Create a Tier Group

imeta set -R rnd0 irods::storage_tiering::group example_group 0
imeta set -R rnd1 irods::storage_tiering::group example_group 1
imeta set -R rnd2 irods::storage_tiering::group example_group 2

Create a tier group named example_group, adding the metadata to the root resources.

Set the Tiering Time Constraints

imeta set -R rnd1 irods::storage_tiering::time 60

Tier 2 does not have a storage tiering time as it will hold data indefinitely.

imeta set -R rnd0 irods::storage_tiering::time 30

Configure tier 0 to hold data for 30 seconds.

Configure tier 1 to hold data for 60 seconds.

Stage some data into storage tier 0.

iput -R rnd0 /tmp/stickers.jpg

Check the results.

$ ils -l
/tempZone/home/rods:
  rods              0 rnd0;st_ufs1      2157087 2024-05-12.15:15 & stickers.jpg
$ imeta ls -d stickers.jpg
AVUs defined for dataObj stickers.jpg:
attribute: irods::access_time
value: 1686582954
units:

Testing the tiering

Sample Tiering rule

{
   "rule-engine-instance-name": "irods_rule_engine_plugin-unified_storage_tiering-instance",
   "rule-engine-operation": "irods_policy_schedule_storage_tiering",
   "delay-parameters": "<INST_NAME>irods_rule_engine_plugin-unified_storage_tiering-instance</INST_NAME><PLUSET>1s</PLUSET><EF>60s REPEAT UNTIL SUCCESS OR 5 TIMES</EF>",
   "storage-tier-groups": [
       "example_group_g2",
       "example_group"
   ]
}
INPUT null
OUTPUT ruleExecOut

The plugin can be invoked by passing it JSON.

  • e.g. Run once until success or five failures

In production, this would be persistently on the delay queue.

Launching the sample Tiering rule

$ iqstat
id     name
10127 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["example_group_g2","example_group"]} 
irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F example_unified_tiering_invocation.r 

Check the delay queue.

Run the rule from the terminal.

$ ils -l
/tempZone/home/rods:
  rods              1 rnd1;st_ufs3      2157087 2024-05-12.15:20 & stickers.jpg

Wait for the delay execution engine to fire...

 

Check the resource for stickers.jpg.

Launching the Tiering rule once more

$ iqstat
id     name
​10129 {"rule-engine-operation":"irods_policy_storage_tiering","storage-tier-groups":["example_group_g2","example_group"]}
irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F example_unified_tiering_invocation.r 

Check the delay queue.

The time for violation is 60 seconds for rnd1.

$ ils -l
/tempZone/home/rods:
  rods              2 rnd2;st_ufs4      2157087 2024-05-12.15:23 & stickers.jpg

Wait for the delay execution engine to fire ...

 

Check the resource for stickers.jpg.

Restaging Data

iget -f stickers.jpg

Fetching data when it is not in the lowest tier will automatically trigger a restaging of the data.

The object will be replicated back to the lowest tier, honoring the verification policy.

$ ils -l
/tempZone/home/rods:
  rods              3 rnd0;st_ufs1      2157087 2024-05-12.15:26 & stickers.jpg
$ iqstat
id name
10035 {"rule-engine-operation":"apply_storage_tiering_policy","storage-tier-groups":["example_group_g2","example_group"]}

Setting a Minimum Restage Tier

imeta set -R rnd1 irods::storage_tiering::minimum_restage_tier true

In order to flag a resource as the lowest tier for restaging data, we add metadata.

The replica will be replicated to this tier instead of the lowest tier.

$ !irule
irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F example_unified_tiering_invocation.r
$ ils -l
/tempZone/home/rods:
  rods              5 rnd2;st_ufs5      2157087 2024-05-12.15:38 & stickers.jpg
$ iget -f stickers.jpg
$ iqstat
id     name
10044 {"destination-resource":"rnd1","object-path":"/tempZone/home/rods/stickers.jpg","preserve-replicas":false,"rule-engine-operation":"migrate_object_to_resource","source-resource":"rnd2","verification-type":"catalog"}
$ ils -l
/tempZone/home/rods:
  rods              6 rnd1;st_ufs3      2157087 2024-05-12.15:56 & stickers.jpg

Note: You may need to run !irule multiple times before the data object is considered to be in violation.

Preserving replicas on a given storage tier

imeta set -R rnd1 irods::storage_tiering::preserve_replicas true

If we want to preserve replicas on a tier we can set a metadata flag.

$ !irule
irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F example_unified_tiering_invocation.r 
$ ils -l
/tempZone/home/rods:
  rods              6 rnd1;st_ufs3      2157087 2024-05-12.15:56 & stickers.jpg
  rods              7 rnd2;st_ufs5      2157087 2024-05-12.15:58 & stickers.jpg

When the staging rule is invoked, the replica on the rnd1 tier will not be trimmed after replication.

A replica is preserved for analysis while another is safe in the archive tier.

Custom Violating Query - GenQuery

Craft a query that replicates the default behavior.

imeta set -R rnd0 irods::storage_tiering::query "SELECT DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_NAME IN ('st_ufs0', 'st_ufs1')" general
  • Compare data object access time against TIME_CHECK_STRING

  • TIME_CHECK_STRING is a macro which is replaced by: now - irods::storage_tiering::time

  • Check DATA_RESC_NAME against the list of child resource names

  • Columns DATA_NAME, COLL_NAME, USER_NAME, USER_ZONE, DATA_REPL_NUM must be queried in that order

  • By default all queries are of the type general, which is optional

Custom Violating Query - Specific Query

iadmin asq "SELECT DISTINCT t0.data_name, t1.coll_name, pdu.user_name, pdu.zone_name, t0.data_repl_num FROM R_DATA_MAIN t0 INNER JOIN R_COLL_MAIN t1 ON t1.coll_id = t0.coll_id INNER JOIN R_RESC_MAIN t2 ON t0.resc_id = t2.resc_id INNER JOIN R_OBJT_ACCESS pdoa ON t0.data_id = pdoa.object_id INNER JOIN R_TOKN_MAIN pdt ON pdoa.access_type_id = pdt.token_id INNER JOIN R_USER_MAIN pdu ON pdoa.user_id = pdu.user_id INNER JOIN R_OBJT_METAMAP ommd ON t0.data_id = ommd.object_id INNER JOIN R_META_MAIN mmd ON ommd.meta_id = mmd.meta_id WHERE t2.resc_name IN ('st_ufs2', 'st_ufs3') AND mmd.meta_attr_name = 'archive_object' AND mmd.meta_attr_value = 'true' AND pdoa.access_type_id >= '1120';" archive_query
imeta set -R rnd1 irods::storage_tiering::query archive_query specific

Attach the query to the middle tier.

This query must be labeled specific via the units field.

Craft a query that uses metadata to identify violating objects.
Add a Specific Query to iRODS which identifies objects satisfying the following.

  • Contains a replica in st_ufs2 or st_ufs3
  • The data object has the AVU ("archive_object", "true", "") attached to it

Testing the queries

Starting over with stickers.jpg ...

Wait for it ...

$ irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F example_unified_tiering_invocation.r
$ iqstat
id     name
10065 {"rule-engine-operation":"apply_storage_tiering_policy","storage-tier-groups":["example_group_g2","example_group"]}
irm -f stickers.jpg
iput -R rnd0 /tmp/stickers.jpg
$ ils -l
/tempZone/home/rods:
  rods              1 rnd1;st_ufs2      2157087 2024-05-12.16:10 & stickers.jpg

Testing the queries

$ ils -l
/tempZone/home/rods:
  rods              1 rnd1;st_ufs2      2157087 2024-05-12.16:10 & stickers.jpg

The file stopped at rnd1 as the time-based default query is now overridden.

Now set the metadata flag to archive the data object.

imeta set -d /tempZone/home/rods/stickers.jpg archive_object true

Testing the queries

$ irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F example_unified_tiering_invocation.r
$ iqstat
id     name
10065 {"rule-engine-operation":"apply_storage_tiering_policy","storage-tier-groups":["example_group_g2","example_group"]} 
$ ils -l
/tempZone/home/rods:
  rods              1 rnd1;st_ufs2      2157087 2024-05-12.16:10 & stickers.jpg
  rods              2 rnd2;st_ufs5      2157087 2024-05-12.16:13 & stickers.jpg

With the metadata is set, run the tiering rule.

The preservation flag is still set so we have two replicas.

Another example

Three Tier Groups with Common Archive

We will create data flow from instrument to archive

Create some more resources

iadmin mkresc tier2 unixfilesystem $(hostname):/tmp/irods/tier2
iadmin mkresc tier0_A unixfilesystem $(hostname):/tmp/irods/tier0_A
iadmin mkresc tier1_A unixfilesystem $(hostname):/tmp/irods/tier1_A

iadmin mkresc tier0_B unixfilesystem $(hostname):/tmp/irods/tier0_B
iadmin mkresc tier1_B unixfilesystem $(hostname):/tmp/irods/tier1_B

iadmin mkresc tier0_C unixfilesystem $(hostname):/tmp/irods/tier2_C
iadmin mkresc tier1_C unixfilesystem $(hostname):/tmp/irods/tier1_C

Create Tier Groups

imeta set -R tier0_A irods::storage_tiering::group tier_group_A 0
imeta set -R tier1_A irods::storage_tiering::group tier_group_A 1
imeta add -R tier2   irods::storage_tiering::group tier_group_A 2
imeta set -R tier0_B irods::storage_tiering::group tier_group_B 0
imeta set -R tier1_B irods::storage_tiering::group tier_group_B 1
imeta add -R tier2   irods::storage_tiering::group tier_group_B 2

Tier Group A

Tier Group B

imeta set -R tier0_C irods::storage_tiering::group tier_group_C 0
imeta set -R tier1_C irods::storage_tiering::group tier_group_C 1
imeta add -R tier2   irods::storage_tiering::group tier_group_C 2

Tier Group C

Set Tier Time Constraints

imeta set -R tier0_A irods::storage_tiering::time 30
imeta set -R tier0_B irods::storage_tiering::time 45
imeta set -R tier0_C irods::storage_tiering::time 15

Tier 0

imeta set -R tier1_A irods::storage_tiering::time 60
imeta set -R tier1_B irods::storage_tiering::time 120
imeta set -R tier1_C irods::storage_tiering::time 180

Tier 1

Tier 2 has no time constraints.

Creating an automated periodic rule

{
   "rule-engine-instance-name": "irods_rule_engine_plugin-unified_storage_tiering-instance",
   "rule-engine-operation": "irods_policy_schedule_storage_tiering",
   "delay-parameters": "<INST_NAME>irods_rule_engine_plugin-unified_storage_tiering-instance</INST_NAME><PLUSET>1s</PLUSET><EF>REPEAT FOR EVER</EF>",
   "storage-tier-groups": [
       "tier_group_A",
       "tier_group_B",
       "tier_group_C"
   ]
}
INPUT null
OUTPUT ruleExecOut

Create a new rule file to periodically apply the storage tiering policy - foo.r

Launch the new rule.

$ irule -r irods_rule_engine_plugin-unified_storage_tiering-instance -F foo.r
$ iqstat
id     name
10096 {"rule-engine-operation":"apply_storage_tiering_policy","storage-tier-groups":["tier_group_A","tier_group_B","tier_group_C"]}

Stage data into all three tiers and watch

iput -R tier0_A /tmp/stickers.jpg stickers_A.jpg
iput -R tier0_B /tmp/stickers.jpg stickers_B.jpg
iput -R tier0_C /tmp/stickers.jpg stickers_C.jpg
$ ils -l
/tempZone/home/rods:
  rods              0 tier0_A      2157087 2024-05-12.17:13 & stickers_A.jpg
  rods              0 tier0_B      2157087 2024-05-12.17:13 & stickers_B.jpg
  rods              0 tier0_C      2157087 2024-05-12.17:13 & stickers_C.jpg
  rods              1 rnd1;st_ufs2      2157087 2024-05-12.16:10 & stickers.jpg
  rods              2 rnd2;st_ufs5      2157087 2024-05-12.16:13 & stickers.jpg

Stage data into all three tiers and watch (continued)

$ ils -l
/tempZone/home/rods:
  rods              2 tier2      2157087 2024-05-12.17:16 & stickers_A.jpg
  rods              1 tier1_B      2157087 2024-05-12.17:15 & stickers_B.jpg
  rods              0 tier0_C      2157087 2024-05-12.17:13 & stickers_C.jpg
  rods              1 rnd1;st_ufs2      2157087 2024-05-12.16:10 & stickers.jpg
  rods              2 rnd2;st_ufs5      2157087 2024-05-12.16:13 & stickers.jpg

And then again...

$ ils -l
/tempZone/home/rods:
  rods              2 tier2      2157087 2024-05-12.17:16 & stickers_A.jpg
  rods              2 tier2      2157087 2024-05-12.17:17 & stickers_B.jpg
  rods              2 tier2      2157087 2024-05-12.17:23 & stickers_C.jpg
  rods              1 rnd1;st_ufs2      2157087 2024-05-12.16:10 & stickers.jpg
  rods              2 rnd2;st_ufs5      2157087 2024-05-12.16:13 & stickers.jpg

All newly ingested stickers_*.jpg all now reside in tier2.

Wait for it ...

Questions?