iRODS Capabilities
Storage Tiering
February 21, 2018
Renaissance Computing Institute
UNC-Chapel Hill
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
iRODS Capabilities
Storage Tiering
iRODS Capabilities
iRODS Capabilities
Storage Tiering Overview
Installing Tiered Storage Plugin
wget -qO - https://packages.irods.org/irods-signing-key.asc | sudo apt-key add -
echo "deb [arch=amd64] https://packages.irods.org/apt/ $(lsb_release -sc) main" | \
sudo tee /etc/apt/sources.list.d/renci-irods.list
sudo apt-get update
Install the package repository
sudo apt-get install irods-rule-engine-plugin-tiered-storage
Install the storage tiering package
Make some resources
iadmin mkresc rnd0 random
iadmin mkresc rnd1 random
iadmin mkresc rnd2 random
iadmin mkresc ufs0 unixfilesystem `hostname`:/tmp/irods/ufs0
iadmin mkresc ufs1 unixfilesystem `hostname`:/tmp/irods/ufs1
iadmin mkresc ufs2 unixfilesystem `hostname`:/tmp/irods/ufs2
iadmin mkresc ufs3 unixfilesystem `hostname`:/tmp/irods/ufs3
iadmin mkresc ufs4 unixfilesystem `hostname`:/tmp/irods/ufs4
iadmin mkresc ufs5 unixfilesystem `hostname`:/tmp/irods/ufs5
iadmin addchildtoresc rnd0 ufs0
iadmin addchildtoresc rnd0 ufs1
iadmin addchildtoresc rnd1 ufs2
iadmin addchildtoresc rnd1 ufs3
iadmin addchildtoresc rnd2 ufs4
iadmin addchildtoresc rnd2 ufs5
As the irods service account
Configuring the rule engine plugin
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-tiered_storage-instance",
"plugin_name": "irods_rule_engine_plugin-tiered_storage",
"plugin_specific_configuration": {
}
},
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
<snip>
},
"shared_memory_instance": "irods_rule_language_rule_engine"
},
]
/etc/irods/server_config.json
Metadata driven Storage Tiering
"plugin_specific_configuration": {
"access_time_attribute" : "irods::access_time",
"storage_tiering_group_attribute" : "irods::storage_tier_group",
"storage_tiering_time_attribute" : "irods::storage_tier_time",
"storage_tiering_query_attribute" : "irods::storage_tier_query",
"storage_tiering_verification_attribute" : "irods::storage_tier_verification",
"storage_tiering_restage_delay_attribute" : "irods::storage_tier_restage_delay",
"default_restage_delay_parameters" : "<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>",
"time_check_string" : "TIME_CHECK_STRING"
}
All default metadata attributes are configurable
Should an existing vocabulary exist, it can be configured
Configuring a Tier Group
imeta add -R rnd0 irods::storage_tier_group example_group 0
imeta add -R rnd1 irods::storage_tier_group example_group 1
imeta add -R rnd2 irods::storage_tier_group example_group 2
Tier groups are entirely driven by metadata
Configuring Tiering Time Constraints
Configure the rnd0 to hold data for only 30 seconds
imeta add -R rnd0 irods::storage_tier_time 30
Then configure the rnd1 to hold data for 2 minutes
imeta add -R rnd1 irods::storage_tier_time 120
rnd2 does not have a storage tier time and holds data indefinitely
Verification of Data Migration
From least to most expensive, verification can be determined by: catalog, file system, or checksum
imeta add -R rnd0 irods::storage_tier_verification catalog
The default configuration
imeta add -R rnd0 irods::storage_tier_verification filesystem
Stat and compare file sizes of catalog and destination replica
Verification of Data Migration
imeta add -R rnd1 irods::storage_tier_verification checksum
Compute a checksum of the data at rest and compare
Should the source replica not have a checksum one will be computed before the replication is performed
Custom Violation Query
Admins may specify a custom query which identifies violating data objects
imeta set -R rnd1 irods::storage_tier_query "SELECT DATA_NAME, COLL_NAME WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'TIME_CHECK_STRING' AND DATA_RESC_ID IN ('10021', '10022')"
Add additional custom metadata to the query to customize for project, user or any other
Launching the sample Tiering rule
{ "rule-engine-instance-name": "irods_rule_engine_plugin-tiered_storage-instance", "rule-engine-operation": "apply_storage_tiering_policy", "delay-parameters": "<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>", "storage-tier-groups": [ "example_group_g2", "example_group" ] } INPUT null OUTPUT ruleExecOut
irule -r irods_rule_engine_plugin-tiered_storage-instance -F example_tiering_invocation.r
JSON ingested by the Tiering plugin - run once until success or six failures
Questions?
Testing Tiered Storage - MungeFS
Mung or munge is computer jargon for a series of potentially destructive or irrevocable changes to a piece of data or a file.
Raymond, Eric S. "The Jargon File, version 4.4.8". catb.org. Archived from the original on June 15, 2015. Retrieved 15 June 2015.
A FUSE filesystem overlay to an underlying file system, controlled by Avro/ZeroMQ messages which order it to misbehave.
Motivation
We required a way to provide various reliable failure modes for the purposes of testing
Mounted as a FUSE volume then exposed as an iRODS resource
Initial Use Case
Testing verification for Tiered Storage
Configuring MungeFS
mungefs /tmp/irods/mnt -omodules=subdir,subdir=/tmp/irods/target
Creating the mount point
fusermount -u /tmp/irods/mnt
Decommissioning the mount point
Configuring MungeFS
mungefsctl --operations "getattr" --corrupt_size
Report invalid file size
mungefsctl --operations "read" --corrupt_data
Corrupt data on a read
mungefsctl --operations "write" --corrupt_data
Corrupt data on a write
Questions?