November 9-19, 2020

Supercomputing 2020

Virtual

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

Policy Composition:

Configuration, Not Code

Policy Composition:

Configuration, Not Code

Our Membership

Consortium

Member

Consortium

Member

Consortium

Member

Consortium

Member

Policy Composition: Configuration, Not Code

With a twenty-five year history, iRODS open source technology has been used to automate data management across many scientific and business domains.  The scale and value of data across these domains drives the necessity for automation.  This variety also demands a flexibility in data management policies over time.  Organizations have satisfied their own needs by investing in the development of their own policy, but this has led to monolithic, specific policy sets tailored to a particular organization.

 

As a community, we have observed common themes and duplication of effort across these organizations and now worked to provide generalized implementations that can be deployed across multiple domains.  This new approach allows multiple policies to be configured together, or composed, without the need for custom development.  For example, our Storage Tiering capability is a composition of several basic policies: Replication, Verification, Retention, and the Violating Object Discovery.

What is Data Management

A Definition of Data Management

 

 

"The development, execution and supervision of plans, policies, programs, and practices that control, protect, deliver, and enhance the value of data and information assets."

 

 

Organizations need a future-proof solution to managing data and its surrounding infrastructure

What is a Policy

A Definition of Policy

 

 

A set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people...

 

 

So how does iRODS do this?

iRODS Policies

The reflection of real world data management decisions in computer actionable code.

 

(a plan of what to do in particular situations)

Possible Policies

  • Data Movement
  • Data Verification
  • Data Retention
  • Data Replication
  • Data Placement
  • Checksum Validation
  • Metadata Extraction
  • Metadata Application
  • Metadata Conformance
  • Replica Verification
  • Vault to Catalog Verification
  • Catalog to Vault Verification
  • ...

The iRODS Data Management Model

Core Competencies

Policy

Capabilities

  Patterns

Some Questions

  • How can we help new users get started?
  • How can we make policy reusable?
  • How can we simplify policy development?
  • How can we provide a cook book of deployments?
  • How do we get from Policy to Capabilities?

Core Competencies

Policy

Capabilities

  Patterns

Policy Composition

Consider Policy as building blocks towards Capabilities

Follow proven software engineering principles:

    Favor composition over monolithic implementations

Rules and Dynamic Policy Enforcement Points can be overloaded and fall through

Implement or configure several rule bases or rule engine plugins to achieve complex use cases

The Original Approach

Assuming there was even a provided policy enforcement point for the desired event...

acPostProcForPut() { 
  if($rescName == "demoResc") {
      # extract and apply metadata
  }
  else if($rescName == "cacheResc") {
      # async replication to archive
  }
  else if($objPath like "/tempZone/home/alice/*" &&
          $rescName == "indexResc") {
      # launch an indexing job
  }
  else if(xyz) {
      # compute checksums ...        
  }
  
  # and so on ...
}

In /etc/irods/core.re ...

Our second approach

For example: pep_data_obj_put_post(...)

  • Metadata extraction and application
  • Asynchronous Replication
  • Initiate Indexing
  • Apply access time metadata
  • Asynchronous checksum computation

Rather than one monolithic implementation, separate the implementations into individual rule bases, or plugins, and allow the rule(s) to fall through

Expanding policy implementation across rule bases

Expanding policy across rule bases

Separate the implementation into several rule bases:

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
  # metadata extraction and application code
  
  RULE_ENGINE_CONTINUE
}

/etc/irods/metadata.re

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
  # checksum code
  
  RULE_ENGINE_CONTINUE
}

/etc/irods/checksum.re

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
  # access time application code
  
  RULE_ENGINE_CONTINUE
}

/etc/irods/access_time.re

Expanding policy across rule bases

Within the Rule Engine Plugin Framework, order matters

        "rule_engines": [
            {
                "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
                "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
                "plugin_specific_configuration": {
                        ...
                        "re_rulebase_set": [
                            "metadata",
                            "checksum",
                            "access_time",
                            "core"
                        ],
                        ...
                },
                "shared_memory_instance" : "irods_rule_language_rule_engine"
            },
            {
                "instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
                "plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
                "plugin_specific_configuration": {
                }
            }
       ]

Initial work with Policy Composition

Consider Storage Tiering as a collection of policies:

  • Data Access Time
  • Identifying Violating Objects
  • Data Replication
  • Data Verification
  • Data Retention

The First Implementation

Policies composed by a monolithic framework plugins

Policy delegated by naming convention:

  • irods_policy_access_time
  • irods_policy_data_movement
  • irods_policy_data_replication
  • irods_policy_data_verification
  • irods_policy_data_retention

Each policy may be overridden by another rule engine, or rule base to customize to future use cases or technologies

The New Approach

Continue to separate the concerns:

  • When : Which policy enforcement points
  • What  : The policy to be invoked
  • Why   : What are the conditions necessary for invocation
  • How   : Synchronous or Asynchronous

Write simple policy implementations

Each policy may now be reused in a generic fashion, favoring configuration over code.

  • Not tied to a Policy Enforcement Point
  • Do one thing well
  • How it is invoked is of no concern

The When

When - Event Handlers

When - The Event Handler

A Rule Engine Plugin for a specific Class of events

The Events are specific to the class of the handler

The handler then invokes policy based on its configuration

  • Data Object
  • Collection
  • Metadata
  • User
  • Resource

When - event_handler-data_object_modified

A Rule Engine Plugin for data creation and modification events

Policy invocation is configured as an array of json objects for any given combination of events

Unifies the POSIX and Object behaviors into a single place to configure policy

  • Create
  • Read
  • Replication
  • Unlink
  • Rename
  • Register

When - event_handler-data_object_modified

Example : Synchronous Invocation

        {
            "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
            "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
            "plugin_specific_configuration": {
                "policies_to_invoke" : [
                    {
                        "active_policy_clauses" : ["post"],
                        "events" : ["create", "write", "registration"],
                        "policy"    : "irods_policy_access_time",
                        "configuration" : {
                        }
                    },
                    {
                        "active_policy_clauses" : ["pre"],
                        "events" : ["replication"],
                        "policy"    : "irods_policy_example_policy",
                        "configuration" : {
                        }
                    }                  
                ]
            }
        }

Note that order still matters if more than one policy needs to be invoked for a given event

The What

What - Simple policy implementations

  • irods_policy_access_time
  • irods_policy_query_processor
  • irods_policy_data_movement
  • irods_policy_data_replication
  • irods_policy_data_verification
  • irods_policy_data_retention

The library will continue to grow, with a cookbook of usages.

Basic policies that are leveraged across many deployments and capabilities:

What - Simple policy implementations

Standardized JSON interface : parameters, and configuration

irods_policy_example_policy_implementation(*parameters, *configuration) {
  
}

iRODS Rule Language

def irods_policy_example_policy_implementation(rule_args, callback, rei):
# Parameters    rule_args[1]
# Configuration rule_args[2]

Python Rule Language

Policy can also be implemented as fast and light C++ rule engine plugins

What - Simple policy implementations

Policy may be invoked using one of three different conventions:

Each invocation convention defines its interface by contract.

  • Direct Invocation : a JSON object
  • Query Processor  : a JSON array of parameters
  • Event Handler      : a JSON object

What - Simple policy implementations

Direct Invocation : Parameters passed as a JSON object

my_rule() {
        irods_policy_access_time( "{\"object_path\" : \"/tempZone/home/rods/file0.txt\"}", "");
}
{
    "policy" : "irods_policy_execute_rule",
    "payload" : {
        "policy_to_invoke" : "irods_policy_storage_tiering",
        "parameters" : {
            "object_path" : "/tempZone/home/rods/file0.txt"
         },
         "configuration" : {
         }
    }
}

Parameters may also be configured statically

What - Simple policy implementations

Query Processor Invocation

{
        "policy" : "irods_policy_enqueue_rule",
        "delay_conditions" : "<PLUSET>1s</PLUSET>",
        "payload" : {
            "policy" : "irods_policy_execute_rule",
            "payload" : {
                "policy_to_invoke" : "irods_policy_query_processor",
                "parameters" : {
                    "query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME like '/tempZone/home/rods%'",
                    "query_limit" : 10,
                    "query_type" : "general",
                    "number_of_threads" : 4,
                    "policy_to_invoke" : "irods_policy_engine_example"
                 }
             }
        }
}

For example the invoked policy would receive a row:

['rods', '/tempZone/home/rods/', 'file0.txt', 'demoResc']

Serializes results to JSON array and passed to the policy via the parameter object as "query_results"

What - Simple policy implementations

Event Handler Invocation

Serializes dataObjInp_t and rsComm_t to a JSON object

{
"comm":{
    "auth_scheme":"native","client_addr":"152.54.8.141","proxy_auth_info_auth_flag":"5","proxy_auth_info_auth_scheme":"",
    "proxy_auth_info_auth_str":"","proxy_auth_info_flag":"0","proxy_auth_info_host":"","proxy_auth_info_ppid":"0",
    "proxy_rods_zone":"tempZone","proxy_sys_uid":"0","proxy_user_name":"rods","proxy_user_other_info_user_comments":"",
    "proxy_user_other_info_user_create":"","proxy_user_other_info_user_info":"","proxy_user_other_info_user_modify":"",
    "proxy_user_type":"","user_auth_info_auth_flag":"5","user_auth_info_auth_scheme":"","user_auth_info_auth_str":"",
    "user_auth_info_flag":"0","user_auth_info_host":"","user_auth_info_ppid":"0","user_rods_zone":"tempZone",
    "user_sys_uid":"0","user_user_name":"rods","user_user_other_info_user_comments":"","user_user_other_info_user_create":"",
    "user_user_other_info_user_info":"","user_user_other_info_user_modify":"","user_user_type":""
    },
"cond_input":{
    "dataIncluded":"","dataType":"generic","destRescName":"ufs0","noOpenFlag":"","openType":"1",
    "recursiveOpr":"1", "resc_hier":"ufs0","selObjType":"dataObj","translatedPath":""
    },
"create_mode":"33204","data_size":"1","event":"CREATE","num_threads":"0",
"obj_path":"/tempZone/home/rods/test_put_gt_max_sql_rows/junk0083",
"offset":"0","open_flags":"2","opr_type":"1",
"policy_enforcement_point":"pep_api_data_obj_put_post"
}

Which is also passed in as the parameter object

What - Simple policy implementations

Configuration

{
    "policy" : "irods_policy_access_time",
    "configuration" : {
        "attribute" : "irods::access_time"
    }
}

Any additional statically set context passed into the policy

May be "plugin_specific_configuration" from a rule engine plugin or "configuration" from within the event framework

The Why

Why - Policy Conditionals

Each invoked policy may set a conditional around each noun within the system which gates the invocation

  • Data Object
  • Collection
  • Metadata
  • User
  • Resource

Leverages boost::regex to match any combination of logical_path, metadata, resource name, etc.

Why - Policy Conditionals

{
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
        {
            "conditional" : {
                "logical_path" : "\/tempZone.*"
            },
            "active_policy_clauses" : ["post"],
            "events" : ["put"],
            "policy"    : "irods_policy_data_replication",
            "configuration" : {
                "source_to_destination_map" : {
                    "demoResc" : ["AnotherResc"]
                }
            }
        },            
        ...
        ]
        ...
    }
}

Matching a logical path for replication policy invocation

Why - Policy Conditionals

{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
    "policies_to_invoke" : [
        {
            "active_policy_clauses" : ["post"],
            "events" : ["put", "write"],
            "policy"    : "irods_policy_event_delegate_collection_metadata",
            "configuration" : {
                "policies_to_invoke" : [
                    {
                        "conditional" : {
                            "metadata" : {
                                "attribute" : "irods::indexing::index",
                                "entity_type" : "data_object"
                            },
                        },
                        "policy"    : "irods_policy_indexing_full_text_index_elasticsearch",
                        "configuration" : {
                            "hosts" : ["http://localhost:9200/"],
                            "bulk_count" : 100,
                            "read_size" : 1024
                        }
                    }
                ]
            }
        }
    ]
}

Matching metadata for indexing policy invocation

The How

How - Asynchronous Execution

{
    "policy" : "irods_policy_enqueue_rule",
    "delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
    "payload" : {
        "policy" : "irods_policy_example",
        "configuration" : {
        }
    }
}
INPUT null
OUTPUT ruleExecOut

The cpp_default rule engine plugin now supports two new policies

  • irods_policy_enqueue_rule
  • irods_policy_execute_rule

The enqueue rule policy will push a job onto the delayed execution queue.  The "payload" object holds the rule which is to be executed.

How - Asynchronous Execution

The execute rule policy invokes a policy engine either from the delayed execution queue or as a direct invocation

{
   "policy" : "irods_policy_execute_rule",
       "payload" : {
           "policy_to_invoke" : "irods_policy_example",
           "parameters" : {      
           },
           "configuration" : {           
           }
        }
    }
}
INPUT null
OUTPUT ruleExecOut

How - Asynchronous Execution

Sample Delayed Rule for Asynchronous Execution

{
        "policy" : "irods_policy_enqueue_rule",
        "delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
        "payload" : {
            "policy" : "irods_policy_execute_rule",
            "payload" : {
                "policy_to_invoke" : "irods_policy_example",
                "parameters" : {
              
                 },
                 "configuration" : {
                   
                 }
             }
        }
}
INPUT null
OUTPUT ruleExecOut

The New Approach

  • When : Which policy enforcement points
  • What  : The policy to be invoked
  • Why   : What are the conditions necessary for invocation
  • How   : Synchronous or Asynchronous

Examples

Synchronous Access Time

 {
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {
                "active_policy_clauses" :  ["post"],
                "events" : ["put", "get", "create", "read", "write", "rename",
                            "register", "unregister", "replication", "checksum",
                            "copy", "seek", "truncate"],
                "policy"    : "irods_policy_access_time",
                "configuration" : {
                }
            }
        ]
    }
}

Synchronous Replication

{
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {   "active_policy_clauses" : ["post"],
                "events" : ["create", "write", "registration"],
                "policy"    : "irods_policy_data_replication",
                "configuration" : {
                    "source_to_destination_map" : {
                        "source_resource_0" : ["destination_resource_0a", "destination_resource_0b"],
                        "source_resource_1" : ["destination_resource_1a"],
                    }
                }
            },
            {   "active_policy_clauses" : ["post"],
                "events" : ["create", "write", "registration"],
                "policy"    : "irods_policy_data_replication",
                "configuration" : {
                    "destination_resource" : "destination_resource_3"
                    }
                }
            },
        ]
    }
}

Asynchronous Replication

{
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {
                "active_policy_clauses" : ["post"],
                "events" : ["create", "write", "registration"],
                "policy" : "irods_policy_enqueue_rule",
                "delay_conditions" : "<ET>PLUSET 1</ET>",
                "payload" : {
                    "policy" : "irods_policy_execute_rule",
                    "payload" : {
                        "policy_to_invoke" : "irods_policy_data_replication",
                        "configuration" : {
                            "source_to_destination_map" : {
                                "source_resource_0" : ["destination_resource_0a", "destination_resource_0b"],
                                "source_resource_1" : ["destination_resource_1a"],
                             }
                        }
                    }
                }
            }                  
        ]
    }
}

Synchronous Retention

{
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {
                "active_policy_clauses" : ["post"],
                "events" : ["replication"],
                "policy" : "irods_policy_data_retention",
                "configuration" : {
                    "mode" : "trim_single_replica",
                    "source_resource_list" : ["source_resource_1", "source_resource_2"]
                }
            }                  
        ]
    }
}

Query Driven Asynchronous Retention

        {
            "policy" : "irods_policy_enqueue_rule",
            "delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
            "payload" : {
                "policy" : "irods_policy_execute_rule",
                "payload" : {
                    "policy_to_invoke" : "irods_policy_query_processor",
                    "parameters" : {
                        "query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE
                                          COLL_NAME like '/tempZone/home/rods%' AND
                                          RESC_NAME IN ('source_resource_1', 'source_resource_2')",
                        "query_limit" : 10,
                        "query_type" : "general",
                        "number_of_threads" : 4,
                        "policy_to_invoke" : "irods_policy_data_retention",
                        "configuration" : {
                            "mode" : "trim_single_replica",
                            "source_resource_list" : ["source_resource_1", "source_resource_2"]
                        }
                    }
                }
            }
        }

Synchronous Verification

        {
            "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
            "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
            "plugin_specific_configuration": {
                "policies_to_invoke" : [
                    {   "active_policy_clauses" : ["post"],
                        "events" : ["create", "write", "registration"],
                        "policy" : "irods_policy_data_verification",
                        "configuration" : {
                            "attribute" : "irods::verification::type"
                        }
                    }                  
                ]
            }
        }

The type of verification to perform is stored as metadata on the resource

  • catalog
  • filesystem
  • checksum

Asynchronous Verification

        {
            "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
            "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
            "plugin_specific_configuration": {
                "policies_to_invoke" : [
                    {   "active_policy_clauses" : ["post"],
                        "events" : ["create", "write", "registration"],
                        "policy" : "irods_policy_enqueue_rule",
                        "delay_conditions" : "<ET>PLUSET 1</ET>",
                        "payload" : {
                            "policy" : "irods_policy_execute_rule",
                            "payload" : {
                                "policy" : "irods_policy_data_verification",
                                "configuration" : {
                                    "attribute" : "irods::verification::type"
                                }
                            }
                        }
                    }                  
                ]
            }
        }

The type of verification to perform is stored as metadata on the resource

  • catalog
  • filesystem
  • checksum

Policy Composed Capabilities

Storage Tiering Overview

Policy Composed Storage Tiering

  • Asynchronous Discovery
  • Asynchronous Replication
  • Synchronous Retention
  • Resource associated metadata
  • Identified by 'tiering groups'
{
    "policy" : "irods_policy_execute_rule",
    "payload" : {
         "policy_to_invoke" : "irods_policy_query_processor",
         "configuration" : {
             "query_string" : "SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::group'",
             "query_limit" : 0,
             "query_type" : "general",
             "number_of_threads" : 8,
             "policy_to_invoke" : "irods_policy_event_generator_resource_metadata",
             "configuration" : {
                 "conditional" : {
                     "metadata" : {
                         "attribute" : "irods::storage_tiering::group",
                         "value" : "{0}"
                     }
                 },
                 "policies_to_invoke" : [
                     {
                         "policy" : "irods_policy_query_processor",
                         "configuration" : {
                             "query_string" : "SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::query' AND RESC_NAME = 'IRODS_TOKEN_SOURCE_RESOURCE_END_TOKEN'",
                             "default_results_when_no_rows_found" : ["SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'IRODS_TOKEN_LIFETIME_END_TOKEN' AND META_DATA_ATTR_UNITS <> 'irods::storage_tiering::inflight' AND DATA_RESC_ID IN (IRODS_TOKEN_SOURCE_RESOURCE_LEAF_BUNDLE_END_TOKEN)"],
                             "query_limit" : 0,
                             "query_type" : "general",
                             "number_of_threads" : 8,
                             "policy_to_invoke" : "irods_policy_query_processor",
                             "configuration" : {
                                 "lifetime" : "IRODS_TOKEN_QUERY_SUBSTITUTION_END_TOKEN(SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::time' AND RESC_NAME = 'IRODS_TOKEN_SOURCE_RESOURCE_END_TOKEN')",
                                 "query_string" : "{0}",
                                 "query_limit" : 0,
                                 "query_type" : "general",
                                 "number_of_threads" : 8,
                                 "policy_to_invoke" : "irods_policy_data_replication",
                                 "configuration" : {
                                     "comment" : "source_resource, and destination_resource supplied by the resource metadata event generator"
                                 }
                             }
                         }
                     }
                ]
             }
         }
    }
}
INPUT null
OUTPUT ruleExecOut

Policy Composed Capability

Asynchronous Discovery and Replication

Policy Composed Capability

Synchronous Configuration for Storage Tiering

{
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {
                "active_policy_clauses" : ["post"],
                "events" : ["put", "get", "create", "read", "write", "rename", "register", "unregister", "replication", "checksum", "copy", "seek", "truncate"],
                 "policy" : "irods_policy_access_time",
                 "configuration" : {
                    "log_errors" : "true"
                 }
            },
            {
                "active_policy_clauses" : ["post"],
                "events" : ["read", "write", "get"],
                "policy"    : "irods_policy_data_restage",
                "configuration" : {
                }
            },
            {
                "active_policy_clauses" : ["post"],
                "events" : ["replication"],
                    "policy"    : "irods_policy_tier_group_metadata",
                    "configuration" : {
                    }

            },
            {
                "active_policy_clauses" : ["post"],
                "events" : ["replication"],
                    "policy"    : "irods_policy_data_verification",
                    "configuration" : {
                    }

            },
            {
                "active_policy_clauses" : ["post"],
                "events" : ["replication"],
                    "policy"    : "irods_policy_data_retention",
                    "configuration" : {
                        "mode" : "trim_single_replica",
                        "log_errors" : "true"
                    }

            }
        ]
    }
}

Policy Composed Capability

Possible Metadata Driven Restage for Storage Tiering

{
    "instance_name": "irods_rule_engine_plugin-event_handler-metadata_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-metadata_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {
                "active_policy_clauses" : ["post"],
                "events" : ["set", "add"],
                "attribute" : "irods::storage_tiering::restage",
                "value"     : "*.*",
                "policy"    : "irods_policy_data_restage",
                "configuration" : {
                }
            }
        ]
    }
}

Data Transfer Nodes Pattern

Policy Composed Data Transfer Node

  • Asynchronous Discovery
  • Asynchronous Retention
  • Synchronous Replication
  • Resource associated metadata
  • Identified by 'replication groups'
{
    "policy" : "irods_policy_enqueue_rule",
    "delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
    "payload" : {
        "policy" : "irods_policy_execute_rule",
        "payload" : {
            "policy_to_invoke" : "irods_policy_query_processor",
            "parameters" : {
                "query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME like '/tempZone/home/rods%' AND RESC_NAME IN ('edge_resource_1', 'edge_resource_2')",
                "query_limit" : 10,
                "query_type" : "general",
                "number_of_threads" : 4,
                "policy_to_invoke" : "irods_policy_data_retention",
                "configuration" : {
                    "mode" : "trim_single_replica",
                    "source_resource_list" : ["edge_resource_1", "edge_resource_2"]
                }
            }
        }
    }
}

Policy Composed Capability

Asynchronous Retention on Edge Resources

Synchronous Replication

{
    "instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
    "plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
    "plugin_specific_configuration": {
        "policies_to_invoke" : [
            {
                "conditional" : {
                    "logical_path" : "\/tempZone.*"
                },
                "pre_or_post_invocation" : ["post"],
                "events" : ["create", "write", "registration"],
                "policy" : "irods_policy_data_replication",
                "configuration" : {
                    "source_to_destination_map" : {
                        "edge_resource_0" : ["long_term_resource_0"],
                        "edge_resource_1" : ["long_term_resource_1"],
                    }
                }
            },
            {
                "conditional" : {
                    "logical_path" : "\/tempZone.*"
                },
                "active_policy_clauses" : ["pre"],
                "events" : ["get"],
                "policy" : "irods_policy_data_replication",
                "configuration" : {
                    "source_to_destination_map" : {
                        "long_term_resource_0" : ["edge_resource_0"],
                        "long_term_resource_1" : ["edge_resource_1"]                      
                    }
                }
            }
        ]
    }
}

Policy Composed Capability

Core Competencies

Policy

Capabilities

Indexing Capability

Core Competencies

Policy

Capabilities

Policy Composed Indexing

  • irods_policy_indexing_full_text_index_elasticsearch
  • irods_policy_indexing_full_text_purge_elasticsearch
  • irods_policy_indexing_metadata_index_elasticsearch
  • irods_policy_indexing_metadata_purge_elasticsearch

Policy implemented as separate policy engine plugins

Core Competencies

Policy

Capabilities

Indexing Policies

"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
'plugin_specific_configuration': {
    "policies_to_invoke" : [
        {
            "active_policy_clauses" : ["post"],
            "events" : ["put", "write"],
            "policy"    : "irods_policy_event_delegate_collection_metadata",
            "configuration" : {
                "policies_to_invoke" : [
                    {
                        "conditional" : {
                            "metadata" : {
                                "attribute" : "irods::indexing::index",
                                "entity_type" : "data_object"
                            },
                        },
                        "policy"    : "irods_policy_indexing_full_text_index_elasticsearch",
                        "configuration" : {
                            "hosts" : ["http://localhost:9200/"],
                            "bulk_count" : 100,
                            "read_size" : 1024
                        }
                    }
                ]
            }
        }
        ...

Synchronously configured full text indexing

Core Competencies

Policy

Capabilities

Indexing Policies

        {
            "active_policy_clauses" : ["pre"],
            "events" : ["unlink", "unregister"],
            "policy"    : "irods_policy_event_delegate_collection_metadata",
            "configuration" : {
                "policies_to_invoke" : [
                    {
                        "conditional" : {
                            "metadata" : {
                                "attribute" : "irods::indexing::index",
                                "entity_type" : "data_object"
                            },
                        },
                        "policy"    : "irods_policy_indexing_full_text_purge_elasticsearch",
                        "configuration" : {
                            "hosts" : ["http://localhost:9200/"],
                            "bulk_count" : 100,
                            "read_size" : 1024
                        }
                    }
                ]
            }
        }
    ]
}

Synchronously configured full text purge

 

Capabilities become easily configured recipes.

 

 

A Policy GUI is now a possibility with

simple manipulation of server side JSON.

 

Summary - Configuration, Not Code

Data management

should be

data-centric and metadata driven.

 

 

Future-proof automated data management

requires

open formats and open source.

SC20 - iRODS Policy Composition: Configuration, Not Code

By iRODS Consortium

SC20 - iRODS Policy Composition: Configuration, Not Code

Exhibitor Forum

  • 1,359