November 12-15, 2018
Applications Engineer, iRODS Consortium
- Packaged and supported solutions
- Require configuration not code
- Derived from the majority of use cases observed in the user community
Storage Tiering Overview
Getting a Docker Image with iRODS
Get the docker image with iRODS 4.2.
Run the instance
Connect a terminal to the instance
Installing Tiered Storage Plugin
Install the package repository
Install the storage tiering package
Configuring Tiered Storage Plugin
Plugin is configured in /etc/irods/server_config.json
Three Tier Groups with Common Archive
We will demonstrate data flow from instrument to archive
Storage Tiering Demo
We will illustrate storage tiering with three storage groups (A, B, and C) each with three storage tiers (0, 1, and 2).
In this example, a common resource is used for the archive tier (2).
Creating the resources:
Configuring the Tiering Policy
The tiering policy can be configured simply by adding metadata to our resources.
- irods::storage_tiering::group - Used to assign a resource to a tier group
- irods::storage_tiering::time - Used to set the time violation policy for objects within the resource
Assigning resources to a storage tiering group
Tier Group A
Tier Group B
Tier Group C
Notice that the archive tier (tier 2) belongs to all three tier groups.
Set Tier Time Constraints
Tier Group A
Tier Group B
The archive tier has no time constraints.
Tier Group C
Running a Storage Tiering Rule
Now we will create a rule file (/var/lib/irods/foo.r) which describes our storage tiering policy for tier groups A, B, C.
Executing this rule with the irule command will put the storage tiering policy on the delay execution queue.
Reminder of What We Have Configured
Stage data into all three groups and watch
All newly ingested files reside in tier 0.
Create a test file
Put the test file at tier 0 in tier groups A, B, and C.
Wait for it...
After one minute...
All newly ingested files now reside in tier 2.
After three minutes...
Restage to Lowest Tier
Now retrieve 10MfileA again
Now let's say we want tier1 to be the minimum tier from group A
Now retrieve 10MfileA
- We set up three storage tiering groups with three tiers for each group.
- We assigned resources to the groups and used a single archive resource as the final tier for each group.
- We demonstrated the flow of files through the various tiers.
- All of this was done by setting metadata on the storage resources.
Configuring Storage Tiering
Let's now discuss all the various ways that the storage tiering framework can be configured and customized...
Data Object Access Time
The default policy for tiering is based on the last time of access for a given data object which is applied as metadata
Dynamic Policy Enforcement Points for RPC API are used to apply the metadata
Configuring a Tier Group
Tier groups are entirely driven by metadata
- The attribute identifies the resource as a tiering group participant
- The value defines the group name
- The unit defines the position within the group
- Tier position, or index, can be any value - order will be honored
- Configuration must be performed at the root of a resource composition
- A resource may belong to many tiering groups
Configuring Tiering Time Constraints
Tiering violation time is configured in seconds
The final tier in a group does not have a storage tiering time
- it will hold data indefinitely
Configure a tier to hold data for 30 seconds
Configure a tier to hold data for 30 days
Verification of Data Migration
When data is found to be in violation:
- Data object is replicated to the next tier
- New replica integrity is verified (in one of three ways)
- Source replica is trimmed
'catalog' is the default verification for all resources
Other verification settings:
- filesystem - performs a stat of the file, more expensive
- checksum - verifies by computing the checksum on the file at rest, most expensive
Configuring the restage resource
When data is in a tier other than the lowest tier, upon access the data is restaged back to the lowest tier.
This flag identifies the tier for restage:
Users may not want data restaged back to the lowest tier, should that tier be very remote or not appropriate for analysis.
Consider a storage resource at the edge serving as a landing zone for instrument data.
Some users may not wish to trim a replica from a tier when data is migrated, such as to allow data to be archived and also still available on fast storage.
To preserve a replica on any given tier, attach the following metadata flag to the root resource.
Custom Violation Query
Admins may specify a custom query which identifies violating data objects
In this case 'TIME_CHECK_STRING' is a macro which is replaced by: now - irods::storage_tiering::time.
Any number of queries may be attached to a resource in order provide a range of criteria by which violating data may be identified
- could include user applied metadata
- could include externally harvested metadata
Storage Tiering Metadata Vocabulary
All default metadata attributes are configurable
Should there be a preexisting vocabulary in your organization,
it can be leveraged by redefining the metadata attributes used by the storage tiering framework.
Custom violation queries
Let's say that instead of a time based violation policy, we only want to trigger violation of an object if it has the metadata of the form archive_object=true. The following custom query could be assigned as the tiering query for the resource.
Attach the query to the resource.
SC18 - Storage Tiering
By iRODS Consortium