Policy Training

Rule Engine Plugins

Policy Training

Rule Engine Plugins

Jason Coposky

@jason_coposky

Executive Director, iRODS Consortium

August 3-6, 2020

KU Leuven Training

Webinar Presentation

Anatomy of a Rule Engine Plugin

Each plugin must define seven operations:

  • start
  • stop
  • rule_exists
  • list_rules
  • exec_rule
  • exec_rule_text
  • exec_rule_expression

 

The Rule Engine Plugin Framework

Manages multiple instantiations of rule engine plugins

  • First by regex match
  • Second by rule_exists()

Delegates policy invocation to rule engine plugins

If a policy returns success, meaning a zero, the framework stops

order matters!

Walks plugins sequentially looking for a plugin to satisfy the invocation

If a policy returns RULE_ENGINE_CONTINUE, the process continues which allows for a clean separation of concerns

If a policy returns an error, meaning a negative value, the framework stops

Rule Engine Continuation

acPostProcForPut() { 
  if($rescName == "demoResc") {
      # extract and apply metadata
  }
  else if($rescName == "cacheResc") {
      # async replication to archive
  }
  else if($objPath like "/tempZone/home/alice/*" &&
          $rescName == "indexResc") {
      # launch an indexing job
  }
  else if(xyz) {
      # compute checksums ...        
  }
  
  # and so on ...
}

The original approach to policy implementation

Rule Engine Continuation

Separate the implementation into several rule bases:

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
  # metadata extraction and application code
  
  RULE_ENGINE_CONTINUE
}

/etc/irods/metadata.re

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
  # checksum code
  
  RULE_ENGINE_CONTINUE
}

/etc/irods/checksum.re

pep_api_data_obj_put_post(*INSTANCE_NAME, *COMM, *DATAOBJINP, *BUFFER, *PORTAL_OPR_OUT) {
  # access time application code
  
  RULE_ENGINE_CONTINUE
}

/etc/irods/access_time.re

Configuration of Rule Engine Plugins

Server configuration is found in /etc/irods/server_config.json

...

"federation": [],
"server_control_plane_port": 1248,
"plugin_configuration": {
    "rule_engines": [
        {
            // native rule engine configuration
        },
        {
            // python rule engine configuration
        },
        {
            // C++ default rule engine configuration
        }      
        ...
    ]
}

All plugins may have configuration parameters in the 'plugin_configuration' object

All instances of rule engine plugins must be configured in the 'rule_engines' array

Configuration of Rule Engine Plugins

Anatomy of a rule engine plugin instance

...

"federation": [],
"server_control_plane_port": 1248,
"plugin_configuration": {
    "rule_engines": [
        {
            "instance_name": "<UNIQUE NAME>",
            "plugin_name": "<DERIVED FROM SHARED OBJECT>",
            "plugin_specific_configuration": {
                <ANYTHING GOES HERE>
            }
            "shared_memory_instance": "<UNIQUE SHM NAME>"
        },
        ...
    ]
}

iRODS Rule Language Configuration

{
    "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance", 
    "plugin_name": "irods_rule_engine_plugin-irods_rule_language", 
    "plugin_specific_configuration": {
        "re_data_variable_mapping_set": [
            "core"
        ], 
        "re_function_name_mapping_set": [
            "core"
        ], 
        "re_rulebase_set": [
            "example_custom_rule_base_0",
            "example_custom_rule_base_1",
            "example_custom_rule_base_2",          
            "core"
        ], 
        "regexes_for_supported_peps": [
            "ac[^ ]*", 
            "msi[^ ]*", 
            "[^ ]*pep_[^ ]*_(pre|post)"
        ]
    }, 
    "shared_memory_instance": "upgraded_legacy_re"
}

iRODS Rule Language Configuration

/etc/irods/core.re is the default implementation and should remain unchanged

Rule implementations are found in /etc/irods/

Custom rule implementations reside in /etc/irods/ and should be configured above core in re_rule_base_set

Basic iRODS Rule Language Example

Static (Legacy) Policy Enforcement Points

Dynamic Policy Enforcement Points

acPostProcForPut() {
    if("ufs_cache" == $KVPairs.rescName) {
        delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") {
            *CacheRescName = "comp_resc;ufs_cache";
            msisync_to_archive("*CacheRescName", $filePath, $objPath );
        }
    }
}
pep_resource_resolve_hierarchy_pre(*INST_NAME, *CTX, *OUT, *OP_TYPE, *HOST, *RESC_HIER, *VOTE)
{
    if( "CREATE" == *OP_TYPE ) {
        if( "pt1" == *INST_NAME) {
            *OUT = "read=1.0;write=0.5"
        }
        else if ( "pt2" == *INST_NAME ) {
            *OUT = "read=1.0;write=1.0"
        }
    }
}

Installing the Python Rule Engine Plugin

sudo apt-get -y install irods-rule-engine-plugin-python

ls /usr/lib/irods/plugins/rule_engines/

libirods_rule_engine_plugin-cpp_default_policy.so  

libirods_rule_engine_plugin-irods_rule_language.so

libirods_rule_engine_plugin-python.so

Install the plugin

See the new shared objects

Python Rule Engine Configuration

Create /etc/irods/core.py from packaged template file

irods $ cp /etc/irods/core.py.template /etc/irods/core.py
"rule_engines": [
    {
         "instance_name" : "irods_rule_engine_plugin-python-instance",
         "plugin_name" : "irods_rule_engine_plugin-python",
         "plugin_specific_configuration" : {}
    }, 
    {
         "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
...

Add the plugin configuration to /etc/irods/server_config.json

Python Rule Language Example

def acPostProcForPut(rule_args, callback, rei):
    Map = session_vars.get_map(rei)
    Kvp = { str(a):str(b) for a,b in Map['key_value_pairs'].items() }
    if 'ufs_cache' == Kvp['rescName']:
        callback.delayExec( 
           ("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>" +
            "<INST_NAME>irods_rule_engine_plugin-python-instance</INST_NAME>" ),
           "callback.msisync_to_archive('{cacheResc}','{file_path}','{object_path}')".format( cacheResc='comp_resc;ufs_cache', 
                                                                                              **Map['data_object'] ), "")
def pep_resource_resolve_hierarchy_pre(rule_args, callback):
   (INST_NAME, CTX, OUT, OP_TYPE, HOST, RESC_HIER, VOTE) = rule_args
  
   if "CREATE" == OP_TYPE :

       if   "pt1" == INST_NAME:
           rule_args[2] = "read=1.0;write=0.5"

       elif "pt2" == INST_NAME:
           rule_args[2] = "read=1.0;write=1.0"

Static (Legacy) Policy Enforcement Points

Dynamic Policy Enforcement Points

A Combination Use of Both Python and iRODS Rules

Add a custom rulebase to /etc/irods/server_config.json

"rule_engines": [
    {
         "instance_name" : "irods_rule_engine_plugin-python-instance",
         "plugin_name" : "irods_rule_engine_plugin-python",
         "plugin_specific_configuration" : {}
    },
    {
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {
            "re_data_variable_mapping_set": [
                "core"
            ],
            "re_function_name_mapping_set": [
                "core"
            ],
            "re_rulebase_set": [
          ----> "training", <----
                "core"
            ],

A Combination Use of Both Python and iRODS Rules

Create /etc/irods/training.re rulebase

add_metadata_to_objpath(*str, *objpath, *objtype) {
    msiString2KeyValPair(*str, *kvp);
    msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype);
}

getSessionVar(*name,*output) {
    *output = eval("str($"++*name++")");
}

A Combination Use of Both Python and iRODS Rules

Copy core.py and python_storage_balancing.py into /etc/irods

 

 

 

This will overwrite the default core.py

sudo cp ~/irods_training/advanced/python_storage_balancing.py /etc/irods/
sudo cp ~/irods_training/advanced/core.py /etc/irods/

The python instantiation of the static PEP now in core.py:

def acPostProcForPut(rule_args, callback, rei):
    sv = session_vars.get_map(rei)
    phypath = sv['data_object']['file_path']
    objpath = sv['data_object']['object_path']
    exiflist = []
    with open(phypath, 'rb') as f:
        tags = EXIF.process_file(f, details=False)
        for (k, v) in tags.iteritems():
            if k not in ('JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote'):
                exifpair = '{0}={1}'.format(k, v)
                exiflist.append(exifpair)
    exifstring = '%'.join(exiflist)
    callback.add_metadata_to_objpath(exifstring, objpath, '-d')
    callback.writeLine('serverLog', 'PYTHON - acPostProcForPut() complete')

Test our combination of rules

wget https://github.com/irods/irods_training/raw/master/stickers.jpg

iput stickers.jpg

Get some test data into iRODS

imeta ls -d stickers.jpg

Confirm the EXIF metadata was harvested

irm -f stickers.jpg

Make sure earlier stickers example is removed

AVUs defined for dataObj stickers.jpg:

attribute: Image Orientation
value: Horizontal (normal)
units:
----
attribute: EXIF ColorSpace
value: sRGB
units:
----

Microservices

We were using the word before it was cool...

  • C++ plugins bound into the rule languages
  • Necessary to reach certainly custom libraries
  • Useful for complex or compute intensive applications

Many are provided by the server but additional plugins may be installed 

We will walk through the statically linked plugins here:

Microservices

Can be invoked directly by the native rule engine

example_rule()
{
    msiDataObjChksum("/tempZone/home/rods/example.txt", "verifyChksum=++++ChksumAll=", *result)
}
def example_rule(rule_args, callback):
    callback.msiDataObjChksum("/tempZone/home/rods/example.txt", "verifyChksum=++++ChksumAll=", *result)

May be reached in python through the callback mechanism

Delayed Execution

iRODS provides an asynchronous means to execute rules

The delay() directive will schedule its body of code into the delayed execution queue

The delay() directive is built into the iRODS rule language and available using the callback mechanism in Python

Delayed Execution

  • EA
  • ET
  • PLUSET
  • EF
  • INST_NAME

The delay is configured via a string using XML parameter syntax

Options include:

execution address, host where the delayed execution needs to be performed

execution time, absolute time when it needs to be performed.

relative execution time to current time when it needs to execute

execution frequency (in time widths) it needs to be performed.

rule engine plugin instance name to target for this rule

Delayed Execution

The EF value is of the format: nnnnU <directive>

  • nnnn is a number
  • U is the unit of the number (s-sec,m-min,h-hour,d-day,y-year)
  •     <empty-directive> same as REPEAT FOR EVER
  •     REPEAT FOR EVER
  •     REPEAT UNTIL SUCCESS
  •     REPEAT nnnn TIMES where nnnn is an integer
  •     REPEAT UNTIL <time>
  •     REPEAT UNTIL SUCCESS OR UNTIL <time>
  •     REPEAT UNTIL SUCCESS OR nnnn TIMES
  •     DOUBLE FOR EVER
  •     DOUBLE UNTIL SUCCESS where delay is doubled every time.
  •     DOUBLE nnnn TIMES
  •     DOUBLE UNTIL <time>
  •     DOUBLE UNTIL SUCCESS OR UNTIL <time>
  •     DOUBLE UNTIL SUCCESS OR nnnn TIMES
  •     DOUBLE UNTIL SUCCESS UPTO <time>

Where <directive> can be of the form:

Delayed Execution

The <time> format may be one of three forms:

  • nnnn
  • nnnns
  • nnnnm
  • nnnnh
  • nnnnd
  • nnnny

<dd.hh:mm:ss> where dd, hh, mm and ss are 2 digits integers

                            representing days, hours minutes and seconds

Truncation from the end is allowed. e.g. 20:40 means mm:ss

an integer. assumed to be in sec

an integer followed by 's' meaning in seconds

an integer followed by 'm' meaning in minutes

an integer followed by 'h' meaning in hours

an integer followed by 'd' meaning in days

an integer followed by 'y' meaning in years

The input can also be full calendar time in the form:
      YYYY-MM-DD.hh:mm:ss

Truncation from the beginning is allowed.
      e.g., 2007-07-29.12 means noon of July 29, 2007.

Delayed Execution

For the native rule language it takes one parameter:

def example_python_rule(rule_args, callback, rei):
    rule_body = """callback.msiObjStat("/tempZone/home/rods/example.txt", stat_out)"""
    callback.delayExec( 
           ("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>" +
            "<INST_NAME>irods_rule_engine_plugin-python-instance</INST_NAME>" ),
            rule_body)
example_delayed_rule() {
    delay("<EF>REPEAT FOR EVER<\EF><INST_NAME>irods-rule-language-instance<\INST_NAME>") {
        msiObjStat("/tempZone/home/rods/example.txt", *stat_out)
    }
}

For the python rule language it takes two parameters, the configuration and the rule body

Remote Execution

The remote directive executes the body of its code on another iRODS server with a signature of:

 

remote("server.host.name", "<ZONE>zone_name</ZONE>")

Remote Execution

The native rule engine plugin usage is similar to the delay()

def example_python_rule(rule_args, callback, rei):
    rule_code = """
def main(rule_args, callback, rei):
    print('This is a test of the Python Remote Rule Execution')"""
    
    callback.py_remote('irods.example.org', '', rule_code, '')            
example_delayed_rule() {
    remote("irods.example.org", "<ZONE>tempZone<\ZONE>") {
        msiObjStat("/tempZone/home/rods/example.txt", *stat_out)
    }
}

For the python rule language there is a separate implementation

Questions?

KU Leuven Policy Training - Rule Engine Plugins

By jason coposky

KU Leuven Policy Training - Rule Engine Plugins

  • 1,252