Policy Training
Policy Composition
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
Policy Training
Policy Composition
August 3-6, 2020
KU Leuven Training
Webinar Presentation
What is Data Management
A Definition of Data Management
"The development, execution and supervision of plans, policies, programs, and practices that control, protect, deliver, and enhance the value of data and information assets."
Organizations need a future-proof solution to managing data and its surrounding infrastructure
What is a Policy
A Definition of Policy
A set of ideas or a plan of what to do in particular situations that has been agreed to officially by a group of people...
So how does iRODS do this?
iRODS Policies
The reflection of real world data management decisions in computer actionable code.
(a plan of what to do in particular situations)
Possible Policies
The iRODS Data Management Model
Core Competencies
Policy
Capabilities
Patterns
Some Questions
Policy Composition
Consider Policy as building blocks towards Capabilities
Follow proven software engineering principles:
Favor composition over monolithic implementations
Rules and Dynamic Policy Enforcement Points can be overloaded and fall through
Implement or configure several rule bases or rule engine plugins to achieve complex use cases
The Original Approach
Assuming there was even a provided policy enforcement point for the desired event...
acPostProcForPut() {
if($rescName == "demoResc") {
# extract and apply metadata
}
else if($rescName == "cacheResc") {
# async replication to archive
}
else if($objPath like "/tempZone/home/alice/*" &&
$rescName == "indexResc") {
# launch an indexing job
}
else if(xyz) {
# compute checksums ...
}
# and so on ...
}
In /etc/irods/core.re ...
Our second approach
For example: pep_data_obj_put_post(...)
Rather than one monolithic implementation, separate the implementations into individual rule bases, or plugins, and allow the rule(s) to fall through
Expanding policy implementation across rule bases
Expanding policy across rule bases
Separate the implementation into several rule bases:
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
# metadata extraction and application code
RULE_ENGINE_CONTINUE
}
/etc/irods/metadata.re
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
# checksum code
RULE_ENGINE_CONTINUE
}
/etc/irods/checksum.re
pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
# access time application code
RULE_ENGINE_CONTINUE
}
/etc/irods/access_time.re
Expanding policy across rule bases
Within the Rule Engine Plugin Framework, order matters
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
...
"re_rulebase_set": [
"metadata",
"checksum",
"access_time",
"core"
],
...
},
"shared_memory_instance" : "irods_rule_language_rule_engine"
},
{
"instance_name": "irods_rule_engine_plugin-cpp_default_policy-instance",
"plugin_name": "irods_rule_engine_plugin-cpp_default_policy",
"plugin_specific_configuration": {
}
}
]
Initial work with Policy Composition
Consider Storage Tiering as a collection of policies:
The First Implementation
Policies composed by a monolithic framework plugins
Policy delegated by naming convention:
Each policy may be overridden by another rule engine, or rule base to customize to future use cases or technologies
The New Approach
Continue to separate the concerns:
Write simple policy implementations
Each policy may now be reused in a generic fashion, favoring configuration over code.
The When
When - Event Handlers
When - The Event Handler
A Rule Engine Plugin for a specific Class of events
The Events are specific to the class of the handler
The handler then invokes policy based on its configuration
When - event_handler-data_object_modified
A Rule Engine Plugin for data creation and modification events
Policy invocation is configured as an array of json objects for any given combination of events
Unifies the POSIX and Object behaviors into a single place to configure policy
When - event_handler-data_object_modified
Example : Synchronous Invocation
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_access_time",
"configuration" : {
}
},
{ "active_policy_clauses" : ["pre"],
"events" : ["replication"],
"policy" : "irods_policy_example_policy",
"configuration" : {
}
}
]
}
}
Note that order still matters if more than one policy need invoked for a given event
The What
What - Simple policy implementations
The library will continue to grow, with a cookbook of usages
Basic policies that are leveraged across many deployments and capabilities:
What - Simple policy implementations
Standardized JSON interface : parameters, and configuration
irods_policy_example_policy_implementation(*parameters, *configuration) {
}
iRODS Rule Language
def irods_policy_example_policy_implementation(rule_args, callback, rei):
# Parameters rule_args[1]
# Configuration rule_args[2]
Python Rule Language
Policy can also be implemented as fast and light C++ rule engine plugins
What - Simple policy implementations
Policy may be invoked using one of three different conventions:
Each invocation convention defines its interface by contract
What - Simple policy implementations
Direct Invocation : Parameters passed as a JSON object
my_rule() {
irods_policy_access_time( "{\"object_path\" : \"/tempZone/home/rods/file0.txt\"}", "");
}
{
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_storage_tiering",
"parameters" : {
"object_path" : "/tempZone/home/rods/file0.txt"
},
"configuration" : {
}
}
}
Parameters may also be configured statically
What - Simple policy implementations
Query Processor Invocation
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<PLUSET>1s</PLUSET>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"parameters" : {
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME like '/tempZone/home/rods%'",
"query_limit" : 10,
"query_type" : "general",
"number_of_threads" : 4,
"policy_to_invoke" : "irods_policy_engine_example"
}
}
}
}
For example the invoked policy would receive a row:
['rods', '/tempZone/home/rods/', 'file0.txt', 'demoResc']
Serializes results to JSON array and passed to the policy via the parameter object as "query_results"
What - Simple policy implementations
Event Handler Invocation
Serializes dataObjInp_t and rsComm_t to a JSON object
{
"comm":{
"auth_scheme":"native","client_addr":"152.54.8.141","proxy_auth_info_auth_flag":"5","proxy_auth_info_auth_scheme":"",
"proxy_auth_info_auth_str":"","proxy_auth_info_flag":"0","proxy_auth_info_host":"","proxy_auth_info_ppid":"0",
"proxy_rods_zone":"tempZone","proxy_sys_uid":"0","proxy_user_name":"rods","proxy_user_other_info_user_comments":"",
"proxy_user_other_info_user_create":"","proxy_user_other_info_user_info":"","proxy_user_other_info_user_modify":"",
"proxy_user_type":"","user_auth_info_auth_flag":"5","user_auth_info_auth_scheme":"","user_auth_info_auth_str":"",
"user_auth_info_flag":"0","user_auth_info_host":"","user_auth_info_ppid":"0","user_rods_zone":"tempZone",
"user_sys_uid":"0","user_user_name":"rods","user_user_other_info_user_comments":"","user_user_other_info_user_create":"",
"user_user_other_info_user_info":"","user_user_other_info_user_modify":"","user_user_type":""
},
"cond_input":{
"dataIncluded":"","dataType":"generic","destRescName":"ufs0","noOpenFlag":"","openType":"1",
"recursiveOpr":"1", "resc_hier":"ufs0","selObjType":"dataObj","translatedPath":""
},
"create_mode":"33204","data_size":"1","event":"CREATE","num_threads":"0",
"obj_path":"/tempZone/home/rods/test_put_gt_max_sql_rows/junk0083",
"offset":"0","open_flags":"2","opr_type":"1",
"policy_enforcement_point":"pep_api_data_obj_put_post"
}
Which is also passed in as the parameter object
What - Simple policy implementations
Configuration
{
"policy" : "irods_policy_access_time",
"configuration" : {
"attribute" : "irods::access_time"
}
}
Any additional statically set context passed into the policy
May be "plugin_specific_configuration" from a rule engine plugin or "configuration" from within the event framework
The Why
Why - Policy Conditionals
Each invoked policy may set a conditional around each noun within the system which gates the invocation
Leverages boost::regex to match any combination of logical_path, metadata, resource name, etc.
Why - Policy Conditionals
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
'plugin_specific_configuration': {
"policies_to_invoke" : [
{
"conditional" : {
"logical_path" : "\/tempZone.*"
},
"active_policy_clauses" : ["post"],
"events" : ["put"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"demoResc" : ["AnotherResc"]
}
}
},
...
]
...
}
}
Matching a logical path for replication policy invocation
Why - Policy Conditionals
import shutil
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
'plugin_specific_configuration': {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "write"],
"policy" : "irods_policy_event_delegate_collection_metadata",
"configuration" : {
"policies_to_invoke" : [
{
"conditional" : {
"metadata" : {
"attribute" : "irods::indexing::index",
"entity_type" : "data_object"
},
},
"policy" : "irods_policy_indexing_full_text_index_elasticsearch",
"configuration" : {
"hosts" : ["http://localhost:9200/"],
"bulk_count" : 100,
"read_size" : 1024
}
}
]
}
}
]
}
Matching metadata for indexing policy invocation
The How
How - Asynchronous Execution
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
"payload" : {
"policy" : "irods_policy_example",
"configuration" : {
}
}
}
INPUT null
OUTPUT ruleExecOut
The cpp_default rule engine plugin in 4.2.8 will now
support two new policies:
The enqueue rule policy will push a job onto the delayed execution queue. The "payload" object holds the rule which is to be executed.
How - Asynchronous Execution
The execute rule policy will invoke a policy engine either from the delayed execute queue or as a direct invocation
{
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_example",
"parameters" : {
},
"configuration" : {
}
}
}
}
INPUT null
OUTPUT ruleExecOut
How - Asynchronous Execution
Sample Delayed Rule for Asynchronous Execution
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_example",
"parameters" : {
},
"configuration" : {
}
}
}
}
INPUT null
OUTPUT ruleExecOut
Examples
Synchronous Access Time
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["put", "get", "create", "read", "write", "rename",
"register", "unregister", "replication", "checksum",
"copy", "seek", "truncate"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_access_time",
"configuration" : {
}
}
]
}
}
Synchronous Replication
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"source_resource_0" : ["destination_resource_0a", "destination_resource_0b"],
"source_resource_1" : ["destination_resource_1a"],
}
}
},
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"destination_resource" : "destination_resource_3"
}
}
},
]
}
}
Asynchronous Replication
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<ET>PLUSET 1</ET>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"source_resource_0" : ["destination_resource_0a", "destination_resource_0b"],
"source_resource_1" : ["destination_resource_1a"],
}
}
}
}
}
]
}
}
Synchronous Retention
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"source_resource_list" : ["source_resource_1", "source_resource_2"]
}
}
]
}
}
Query Driven Asynchronous Retention
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"parameters" : {
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE
COLL_NAME like '/tempZone/home/rods%' AND
RESC_NAME IN ('source_resource_1', 'source_resource_2')",
"query_limit" : 10,
"query_type" : "general",
"number_of_threads" : 4,
"policy_to_invoke" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"source_resource_list" : ["source_resource_1", "source_resource_2"]
}
}
}
}
}
Synchronous Verification
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registation"],
"policy" : "irods_policy_data_verification",
"configuration" : {
"attribute" : "irods::verification::type"
}
}
]
}
}
The type of verification to perform is stored as metadata on the resource:
Asynchronous Verification
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{ "active_policy_clauses" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<ET>PLUSET 1</ET>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy" : "irods_policy_data_verification",
"configuration" : {
"attribute" : "irods::verification::type"
}
}
}
}
]
}
}
The type of verification to perform is stored as metadata on the resource
Policy Composed Capabilities
Storage Tiering Overview
Policy Composed Storage Tiering
{
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"configuration" : {
"query_string" : "SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::group'",
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 8,
"policy_to_invoke" : "irods_policy_event_generator_resource_metadata",
"configuration" : {
"conditional" : {
"metadata" : {
"attribute" : "irods::storage_tiering::group",
"value" : "{0}"
}
},
"policies_to_invoke" : [
{
"policy" : "irods_policy_query_processor",
"configuration" : {
"query_string" : "SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::query' AND RESC_NAME = 'IRODS_TOKEN_SOURCE_RESOURCE_END_TOKEN'",
"default_results_when_no_rows_found" : ["SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE META_DATA_ATTR_NAME = 'irods::access_time' AND META_DATA_ATTR_VALUE < 'IRODS_TOKEN_LIFETIME_END_TOKEN' AND META_DATA_ATTR_UNITS <> 'irods::storage_tiering::inflight' AND DATA_RESC_ID IN (IRODS_TOKEN_SOURCE_RESOURCE_LEAF_BUNDLE_END_TOKEN)"],
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 8,
"policy_to_invoke" : "irods_policy_query_processor",
"configuration" : {
"lifetime" : "IRODS_TOKEN_QUERY_SUBSTITUTION_END_TOKEN(SELECT META_RESC_ATTR_VALUE WHERE META_RESC_ATTR_NAME = 'irods::storage_tiering::time' AND RESC_NAME = 'IRODS_TOKEN_SOURCE_RESOURCE_END_TOKEN')",
"query_string" : "{0}",
"query_limit" : 0,
"query_type" : "general",
"number_of_threads" : 8,
"policy_to_invoke" : "irods_policy_data_replication",
"configuration" : {
"comment" : "source_resource, and destination_resource supplied by the resource metadata event generator"
}
}
}
}
]
}
}
}
}
INPUT null
OUTPUT ruleExecOut
Policy Composed Capability
Asynchronous Replication
Policy Composed Capability
Synchronous Configuration for Storage Tiering
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "get", "create", "read", "write", "rename", "register", "unregister", "replication", "checksum", "copy", "seek", "truncate"],
"policy" : "irods_policy_access_time",
"configuration" : {
"log_errors" : "true"
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["read", "write", "get"],
"policy" : "irods_policy_data_restage",
"configuration" : {
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_tier_group_metadata",
"configuration" : {
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_data_verification",
"configuration" : {
}
},
{
"active_policy_clauses" : ["post"],
"events" : ["replication"],
"policy" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"log_errors" : "true"
}
}
]
}
}
Policy Composed Capability
Possible Metadata Driven Restage for Storage Tiering
{
"instance_name": "irods_rule_engine_plugin-event_handler-metadata_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-metadata_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["set", "add"],
"attribute" : "irods::storage_tiering::restage",
"value" : "*.*",
"policy" : "irods_policy_data_restage",
"configuration" : {
}
}
]
}
}
Data Transfer Nodes Pattern
Policy Composed Data Transfer Node
{
"policy" : "irods_policy_enqueue_rule",
"delay_conditions" : "<EF>REPEAT FOR EVER</EF>",
"payload" : {
"policy" : "irods_policy_execute_rule",
"payload" : {
"policy_to_invoke" : "irods_policy_query_processor",
"parameters" : {
"query_string" : "SELECT USER_NAME, COLL_NAME, DATA_NAME, RESC_NAME WHERE COLL_NAME like '/tempZone/home/rods%' AND RESC_NAME IN ('edge_resource_1', 'edge_resource_2')",
"query_limit" : 10,
"query_type" : "general",
"number_of_threads" : 4,
"policy_to_invoke" : "irods_policy_data_retention",
"configuration" : {
"mode" : "trim_single_replica",
"source_resource_list" : ["edge_resource_1", "edge_resource_2"]
}
}
}
}
}
Policy Composed Capability
Asynchronous Retention on Edge Resources
Synchronous Replication
{
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
"plugin_specific_configuration": {
"policies_to_invoke" : [
{
"conditional" : {
"logical_path" : "\/tempZone.*"
},
"pre_or_post_invocation" : ["post"],
"events" : ["create", "write", "registration"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"edge_resource_0" : ["long_term_resource_0"],
"edge_resource_1" : ["long_term_resource_1"],
}
}
},
{
"conditional" : {
"logical_path" : "\/tempZone.*"
},
"active_policy_clauses" : ["pre"],
"events" : ["get"],
"policy" : "irods_policy_data_replication",
"configuration" : {
"source_to_destination_map" : {
"long_term_resource_0" : ["edge_resource_0"],
"long_term_resource_1" : ["edge_resource_1"]
}
}
}
]
}
}
Policy Composed Capability
Core Competencies
Policy
Capabilities
Indexing Capability
Core Competencies
Policy
Capabilities
Policy Composed Indexing
Policy implemented as separate policy engine plugins
Core Competencies
Policy
Capabilities
Indexing Policies
"instance_name": "irods_rule_engine_plugin-event_handler-data_object_modified-instance",
"plugin_name": "irods_rule_engine_plugin-event_handler-data_object_modified",
'plugin_specific_configuration': {
"policies_to_invoke" : [
{
"active_policy_clauses" : ["post"],
"events" : ["put", "write"],
"policy" : "irods_policy_event_delegate_collection_metadata",
"configuration" : {
"policies_to_invoke" : [
{
"conditional" : {
"metadata" : {
"attribute" : "irods::indexing::index",
"entity_type" : "data_object"
},
},
"policy" : "irods_policy_indexing_full_text_index_elasticsearch",
"configuration" : {
"hosts" : ["http://localhost:9200/"],
"bulk_count" : 100,
"read_size" : 1024
}
}
]
}
}
...
Synchronously configured full text indexing
Core Competencies
Policy
Capabilities
Indexing Policies
{
"active_policy_clauses" : ["pre"],
"events" : ["unlink", "unregister"],
"policy" : "irods_policy_event_delegate_collection_metadata",
"configuration" : {
"policies_to_invoke" : [
{
"conditional" : {
"metadata" : {
"attribute" : "irods::indexing::index",
"entity_type" : "data_object"
},
},
"policy" : "irods_policy_indexing_full_text_purge_elasticsearch",
"configuration" : {
"hosts" : ["http://localhost:9200/"],
"bulk_count" : 100,
"read_size" : 1024
}
}
]
}
}
]
}
Synchronously configured full text purge
Summary - Configuration not Code
Questions?