November 9-19, 2020
Supercomputing 2020
Virtual
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
Policy Composition:
Configuration, Not Code
Policy Composition:
Configuration, Not Code
Our Membership
Consortium
Member
Consortium
Member
Consortium
Member
Consortium
Member
iRODS as the Integration Layer
Institutional repositories
As data matures and reaches a broader community, data management policy must also evolve to meet these additional requirements.
Policy Composition: Configuration, Not Code
With a twenty-five year history, iRODS open source technology has been used to automate data management across many scientific and business domains. The scale and value of data across these domains drives the necessity for automation. This variety also demands a flexibility in data management policies over time. Organizations have satisfied their own needs by investing in the development of their own policy, but this has led to monolithic, specific policy sets tailored to a particular organization.
As a community, we have observed common themes and duplication of effort across these organizations and now worked to provide generalized implementations that can be deployed across multiple domains. This new approach allows multiple policies to be configured together, or composed, without the need for custom development. For example, our Storage Tiering capability is a composition of several basic policies: Replication, Verification, Retention, and the Violating Object Discovery.
What is Data Management
A Definition of Data Management
"The development, execution and supervision of plans, policies, programs, and practices that control, protect, deliver, and enhance the value of data and information assets."
Organizations need a future-proof solution to managing data and its surrounding infrastructure
What is a Policy
A Definition of Policy
A set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people...
So how does iRODS do this?
iRODS Policies
The reflection of real world data management decisions in computer actionable code.
(a plan of what to do in particular situations)
Possible Policies
The iRODS Data Management Model
Core Competencies
Policy
Capabilities
Patterns
Policy Composition
Consider Policy as building blocks towards Capabilities
Follow proven software engineering principles:
Favor composition over monolithic implementations
Rules and Dynamic Policy Enforcement Points can be overloaded and fall through
Implement or configure several rule bases or rule engine plugins to achieve complex use cases
Policy Composition
Consider Storage Tiering as a collection of policies:
The New Approach
Continue to separate the concerns:
Write simple policy implementations
Each policy may now be reused in a generic fashion, favoring configuration over code.
Examples
Synchronous Access Time
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "get", "create", "read", "write", "rename",
"register", "unregister", "replication", "checksum",
"copy", "seek", "truncate"],
"policy" : "irods_policy_access_time",
"configuration" : {
}
}
]
}
}
Synchronous Replication
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"source_resource_0" : ["destination_resource_0a", "destination_resource_0b"],
"source_resource_1" : ["destination_resource_1a"],
}
}
},
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"destination_resource" : "destination_resource_3"
}
}
},
]
}
}
Asynchronous Replication
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<ET>PLUSET 1</ET>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"source_resource_0" : ["destination_resource_0a", "destination_resource_0b"],
"source_resource_1" : ["destination_resource_1a"],
}
}
}
}
}
]
}
}
Synchronous Retention
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"source_resource_list" : ["source_resource_1", "source_resource_2"]
}
}
]
}
}
Query Driven Asynchronous Retention
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"parameters" : {
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE
COLL_NAME like '/tempZone/home/rods%' AND
RESC_NAME IN ('source_resource_1', 'source_resource_2')",
"query_limit" : 10,
"query_type" : "general",
"number_of_threads" : 4,
"policy_to_invoke" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"source_resource_list" : ["source_resource_1", "source_resource_2"]
}
}
}
}
}
Synchronous Verification
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_verification",
"configuration" : {
"attribute" : "irods::verification::type"
}
}
]
}
}
The type of verification to perform is stored as metadata on the resource
Asynchronous Verification
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<ET>PLUSET 1</ET>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy" : "irods_policy_data_verification",
"configuration" : {
"attribute" : "irods::verification::type"
}
}
}
}
]
}
}
The type of verification to perform is stored as metadata on the resource
Policy Composed Capabilities
Storage Tiering Overview
Policy Composed Storage Tiering
{
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"configuration" : {
"query_string" : "SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::group'",
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 8,
"policy_to_invoke" : "irods_policy_event_generator_resource_metadata",
"configuration" : {
"conditional" : {
"metadata" : {
"attribute" : "irods::storage_tiering::group",
"value" : "{0}"
}
},
"policies_to_invoke" : [
{
"policy" : "irods_policy_query_processor",
"configuration" : {
"query_string" : "SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::query' AND RESC_NAME = 'IRODS_TOKEN_SOURCE_RESOURCE_END_TOKEN'",
"default_results_when_no_rows_found" : ["SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'IRODS_TOKEN_LIFETIME_END_TOKEN' AND META_DATA_ATTR_UNITS <> 'irods::storage_tiering::inflight' AND DATA_RESC_ID IN (IRODS_TOKEN_SOURCE_RESOURCE_LEAF_BUNDLE_END_TOKEN)"],
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 8,
"policy_to_invoke" : "irods_policy_query_processor",
"configuration" : {
"lifetime" : "IRODS_TOKEN_QUERY_SUBSTITUTION_END_TOKEN(SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::time' AND RESC_NAME = 'IRODS_TOKEN_SOURCE_RESOURCE_END_TOKEN')",
"query_string" : "{0}",
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 8,
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"comment" : "source_resource, and destination_resource supplied by the resource metadata event generator"
}
}
}
}
]
}
}
}
}
INPUT null
OUTPUT ruleExecOut
Policy Composed Capability
Asynchronous Discovery and Replication
Policy Composed Capability
Synchronous Configuration for Storage Tiering
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "get", "create", "read", "write", "rename", "register", "unregister", "replication", "checksum", "copy", "seek", "truncate"],
"policy" : "irods_policy_access_time",
"configuration" : {
"log_errors" : "true"
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["read", "write", "get"],
"policy" : "irods_policy_data_restage",
"configuration" : {
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_tier_group_metadata",
"configuration" : {
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_data_verification",
"configuration" : {
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"log_errors" : "true"
}
}
]
}
}
Policy Composed Capability
Possible Metadata Driven Restage for Storage Tiering
{
"instance_name": "irods_rule_engine_plugin-event_handler-metadata_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-metadata_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["set", "add"],
"attribute" : "irods::storage_tiering::restage",
"value" : "*.*",
"policy" : "irods_policy_data_restage",
"configuration" : {
}
}
]
}
}
Data Transfer Nodes Pattern
Policy Composed Data Transfer Node
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"parameters" : {
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME like '/tempZone/home/rods%' AND RESC_NAME IN ('edge_resource_1', 'edge_resource_2')",
"query_limit" : 10,
"query_type" : "general",
"number_of_threads" : 4,
"policy_to_invoke" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"source_resource_list" : ["edge_resource_1", "edge_resource_2"]
}
}
}
}
}
Policy Composed Capability
Asynchronous Retention on Edge Resources
Synchronous Replication
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"conditional" : {
"logical_path" : "\/tempZone.*"
},
"pre_or_post_invocation" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"edge_resource_0" : ["long_term_resource_0"],
"edge_resource_1" : ["long_term_resource_1"],
}
}
},
{
"conditional" : {
"logical_path" : "\/tempZone.*"
},
"active_policy_clauses" : ["pre"],
"events" : ["get"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"long_term_resource_0" : ["edge_resource_0"],
"long_term_resource_1" : ["edge_resource_1"]
}
}
}
]
}
}
Policy Composed Capability
Core Competencies
Policy
Capabilities
Indexing Capability
Core Competencies
Policy
Capabilities
Policy Composed Indexing
Policy implemented as separate policy engine plugins
Core Competencies
Policy
Capabilities
Indexing Policies
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
'plugin_specific_configuration': {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "write"],
"policy" : "irods_policy_event_delegate_collection_metadata",
"configuration" : {
"policies_to_invoke" : [
{
"conditional" : {
"metadata" : {
"attribute" : "irods::indexing::index",
"entity_type" : "data_object"
},
},
"policy" : "irods_policy_indexing_full_text_index_elasticsearch",
"configuration" : {
"hosts" : ["http://localhost:9200/"],
"bulk_count" : 100,
"read_size" : 1024
}
}
]
}
}
...
Synchronously configured full text indexing
Core Competencies
Policy
Capabilities
Indexing Policies
{
"active_policy_clauses" : ["pre"],
"events" : ["unlink", "unregister"],
"policy" : "irods_policy_event_delegate_collection_metadata",
"configuration" : {
"policies_to_invoke" : [
{
"conditional" : {
"metadata" : {
"attribute" : "irods::indexing::index",
"entity_type" : "data_object"
},
},
"policy" : "irods_policy_indexing_full_text_purge_elasticsearch",
"configuration" : {
"hosts" : ["http://localhost:9200/"],
"bulk_count" : 100,
"read_size" : 1024
}
}
]
}
}
]
}
Synchronously configured full text purge
Capabilities become easily configured recipes.
A Policy GUI is now a possibility with
simple manipulation of server side JSON.
Summary - Configuration, Not Code
Data management
should be
data-centric and metadata driven.
Future-proof automated data management
requires
open formats and open source.