Rule Engine Plugins

June 25-27, 2019

iRODS User Group Meeting 2019

Utrecht, Netherlands

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

Rule Engine Plugins

Anatomy of a Rule Engine Plugin

Represents the last of the core plugin interfaces

 

Each plugin must define seven operations:

  • start
  • stop
  • rule_exists
  • list_rules
  • exec_rule
  • exec_rule_text
  • exec_rule_expression

 

Configuration of Rule Engine Plugins

"plugin_configuration": {
    "rule_engines": [
        {
            ...
        },
        ...
    ]
}

Within /etc/irods/server_config.json -

    New JSON array holding rule engine configuration

Order Is Important

  • multiple rule engines can be run concurrently, but this array defines which rule engine takes priority

Configuration of Rule Engine Plugins

Anatomy of a Rule Engine Plugin JSON object

{
    "instance_name": "<UNIQUE NAME>",
    "plugin_name": "<DERIVED FROM SHARED OBJECT>",
    "plugin_specific_configuration": {
        <ANYTHING GOES HERE>
    }
    "shared_memory_instance": "<UNIQUE SHM NAME>"
}

iRODS Rule Language Configuration

Similar to 4.1

  • dvm, fnm, and rulebase were moved into the plugin_specific_configuration
  • regexes for supported PEPs were added
  • sets are now just arrays of filenames
{
    "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance", 
    "plugin_name": "irods_rule_engine_plugin-irods_rule_language", 
    "plugin_specific_configuration": {
        "re_data_variable_mapping_set": [
            "core"
        ], 
        "re_function_name_mapping_set": [
            "core"
        ], 
        "re_rulebase_set": [
            "core"
        ], 
        "regexes_for_supported_peps": [
            "ac[^ ]*", 
            "msi[^ ]*", 
            "[^ ]*pep_[^ ]*_(pre|post)"
        ]
    }, 
    "shared_memory_instance": "upgraded_legacy_re"
}

Basic iRODS Rule Language Example

acPostProcForPut() {
    if("ufs_cache" == $KVPairs.rescName) {
        delay("<PLUSET>1s</PLUSET><EF>1h DOUBLE UNTIL SUCCESS OR 6 TIMES</EF>") {
            *CacheRescName = "comp_resc;ufs_cache";
            msisync_to_archive("*CacheRescName", $filePath, $objPath );
        }
    }
}
pep_resource_resolve_hierarchy_pre(
  *INST_NAME,*CTX,*OUT,*OP_TYPE,*HOST,*RESC_HIER,*VOTE){
    if( "CREATE" == *OP_TYPE ) {
        if( "pt1" == *INST_NAME) {
            *OUT = "read=1.0;write=0.5"
        }
        else if ( "pt2" == *INST_NAME ) {
            *OUT = "read=1.0;write=1.0"
        }
    }
}

Static (Legacy) Policy Enforcement Points

Dynamic Policy Enforcement Points

Questions?

Installing the Python Rule Engine Plugin

sudo apt-get -y install irods-rule-engine-plugin-python

ls /usr/lib/irods/plugins/rule_engines/

libirods_rule_engine_plugin-cpp_default_policy.so  

libirods_rule_engine_plugin-irods_rule_language.so

libirods_rule_engine_plugin-python.so

Install the plugin

See the new shared objects

Python Rule Engine Configuration

Create /etc/irods/core.py from packaged template file

 

 

 

Edit /etc/irods/server_config.json

  • insert the new python plugin configuration stanza before the iRODS Rule Language plugin
  • this allows it to service requests first
"rule_engines": [
    {
         "instance_name" : "irods_rule_engine_plugin-python-instance",
         "plugin_name" : "irods_rule_engine_plugin-python",
         "plugin_specific_configuration" : {}
    }, 
    {
         "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",​​
irods $ cp /etc/irods/core.py.template /etc/irods/core.py

Set up a Custom Rulebase (training.re)

"rule_engines": [
    ...
    ...
    {
        "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
        "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
        "plugin_specific_configuration": {
            "re_data_variable_mapping_set": [
                "core"
            ],
            "re_function_name_mapping_set": [
                "core"
            ],
            "re_rulebase_set": [
                "training",
                "core"
            ],

Add a custom rulebase to /etc/irods/server_config.json

A Combination Use of Both Python and iRODS Rules

add_metadata_to_objpath(*str, *objpath, *objtype) {
    msiString2KeyValPair(*str, *kvp);
    msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype);
}

getSessionVar(*name,*output) {
    *output = eval("str($"++*name++")");
}

Create /etc/irods/training.re rulebase

A Python Static Policy Enforcement Point

Copy core.py and python_storage_balancing.py into /etc/irods

 

 

 

This will overwrite the default core.py

sudo cp ~/irods_training/advanced/python_storage_balancing.py /etc/irods/
sudo cp ~/irods_training/advanced/core.py /etc/irods/
def acPostProcForPut(rule_args, callback, rei):
    sv = session_vars.get_map(rei)
    phypath = sv['data_object']['file_path']
    objpath = sv['data_object']['object_path']
    exiflist = []
    with open(phypath, 'rb') as f:
        tags = EXIF.process_file(f, details=False)
        for (k, v) in tags.iteritems():
            if k not in ('JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote'):
                exifpair = '{0}={1}'.format(k, v)
                exiflist.append(exifpair)
    exifstring = '%'.join(exiflist)
    callback.add_metadata_to_objpath(exifstring, objpath, '-d')
    callback.writeLine('serverLog', 'PYTHON - acPostProcForPut() complete')

The python instantiation of the static PEP now in core.py:

Test our combination of rules

wget https://github.com/irods/irods_training/raw/master/stickers.jpg

iput stickers.jpg

Get some test data into iRODS

imeta ls -d stickers.jpg

Confirm the EXIF metadata was harvested

irm -f stickers.jpg

Make sure earlier stickers example is removed

AVUs defined for dataObj stickers.jpg:

attribute: Image Orientation
value: Horizontal (normal)
units:
----
attribute: EXIF ColorSpace
value: sRGB
units:
----

Storage Balancing

Create a resource hierarchy for our storage balancing example.

iadmin mkresc def_resc deferred
iadmin mkresc ufs1 unixfilesystem `hostname`:/tmp/ufs1
iadmin mkresc ufs2 unixfilesystem `hostname`:/tmp/ufs2
iadmin mkresc pt1 passthru
iadmin mkresc pt2 passthru
iadmin addchildtoresc def_resc pt1
iadmin addchildtoresc def_resc pt2
iadmin addchildtoresc pt1 ufs1
iadmin addchildtoresc pt2 ufs2
iadmin modresc pt1 context "max_bytes=20000000"
iadmin modresc pt2 context "max_bytes=20000000"
irods@example:~$ ilsresc
def_resc:deferred
├── pt1:passthru
│   └── ufs1
└── pt2:passthru
    └── ufs2
demoResc: unixfilesystem

Implementing the Storage Balanced Plugin

findRescType(*INST_NAME, *OUT) {
    foreach ( *ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME' ) {
        *OUT = *ROW.RESC_TYPE_NAME;
    }
}
findInstId(*INST_NAME, *OUT) {
    foreach ( *ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME' ) {
        *OUT = *ROW.RESC_ID;
    }
}
findBytesUsed(*INST_ID, *OUT) {
    foreach ( *ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID' ) {
        *STORAGE_RESC = *ROW1.RESC_NAME;
        *TEMP = 0
        foreach ( *ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC' ) {
            *TEMP = *TEMP + int(*ROW2.DATA_SIZE)
        }
        *OUT = "*TEMP"
    }
}
findContextString(*INST_NAME, *OUT) {
    foreach ( *ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME' ) {
        *OUT = *ROW.RESC_CONTEXT;
    }
}

A few helper functions to add to /etc/irods/training.re

Code can be found at ~/irods_training/advanced/python_storage_balancing.re

The Python Storage Balancing Rule

def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei):
    if rule_args[3] == 'CREATE':
        ret = callback.findRescType(rule_args[0], '')
        resc_type = ret['arguments'][1]
        if (resc_type == 'passthru'):
            ret = callback.findInstId(rule_args[0], '')
            inst_id = ret['arguments'][1]

            ret = callback.findBytesUsed(inst_id, '')
            bytes_used = ret['arguments'][1]

            ret = callback.findContextString(rule_args[0], '')
            context_string = ret['arguments'][1]

            max_bytes = -1
            max_bytes_index = context_string.find('max_bytes')
            if max_bytes_index != -1:
                max_bytes_re = 'max_bytes=(\d+)'
                max_bytes_search = re.search(max_bytes_re, context_string)
                max_bytes_str = max_bytes_search.group(1)
                max_bytes = max_bytes_str

            percent_full = 0.0
            if max_bytes == -1:
                percent_full = 0.0
            elif max_bytes == 0:
                percent_full = 1.0
            else:
                percent_full = float(bytes_used)/float(max_bytes)

            write_weight = 1.0 - percent_full
            rule_args[2] = 'read=1.0;write=' + str(write_weight)

Uncomment definition for pep_resource_resolve_hierarchy_pre in /etc/irods/core.py

#def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei):
#    return python_storage_balancing.pep_resource_resolve_hierarchy_pre(rule_args, callback, rei)

The definition in /etc/irods/python_storage_balancing.py:

Testing the Storage Balancing Rule

iput -R def_resc VERSION.json f1
iput -R def_resc VERSION.json f2
iput -R def_resc VERSION.json f3
iput -R def_resc VERSION.json f4

ils -l
/tempZone/home/rods:
  rods              0 def_resc;pt2;ufs2          224 2017-06-05.16:47 & f1
  rods              0 def_resc;pt1;ufs1          224 2017-06-05.16:47 & f2
  rods              0 def_resc;pt2;ufs2          224 2017-06-05.16:47 & f3
  rods              0 def_resc;pt1;ufs1          224 2017-06-05.16:47 & f4

Remove all data from def_resc

  • irmtrash - any data in the trashcan will count towards the total being used to determine the distribution of incoming files

Questions?

A Storage Balancing C++ Rule Engine Plugin

Code can be found at

~/irods_training/advanced/irods_rule_engine_plugin_storage_balancing/src/irods_rule_engine_plugin-cpp-storage_balancing.cpp

 

 

Start with the Factory

extern "C"
irods::pluggable_rule_engine<irods::default_re_ctx>* plugin_factory(const std::string& _instance_name,
                                 const std::string& _context) {
    auto re{new irods::pluggable_rule_engine<irods::default_re_ctx>(_instance_name , _context)};
    re->add_operation<irods::default_re_ctx&, const std::string&>(
            "start",
            std::function<irods::error(irods::default_re_ctx&, const std::string&)>(start));
    re->add_operation<irods::default_re_ctx&, const std::string&>(
            "stop",
            std::function<irods::error(irods::default_re_ctx&, const std::string&)>(stop));
    re->add_operation<irods::default_re_ctx&, std::string, bool&>(
            "rule_exists",
            std::function<irods::error(irods::default_re_ctx&, std::string, bool&)>(rule_exists));
    re->add_operation<irods::default_re_ctx&, std::vector<std::string>&>(
            "list_rules",
            std::function<irods::error(irods::default_re_ctx&,std::vector<std::string>&)>( list_rules ) );
    re->add_operation<irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback>(
            "exec_rule",
            std::function<irods::error(irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback)>(exec_rule));
    re->add_operation<irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback>(
            "exec_rule_text",
            std::function<irods::error(irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback)>(exec_rule_text));
    re->add_operation<irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback>(
            "exec_rule_expression",
            std::function<irods::error(irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback)>(exec_rule_expression));
    return re;
}

Anatomy of the Plugin Factory

extern "C"
irods::pluggable_rule_engine<irods::default_re_ctx>*
    plugin_factory(
        const std::string& _instance_name,
        const std::string& _context) {
....
}
  • Must have C linkage

  • Returns an irods::pluggable_rule_engine<>*

  • Accepts two const std::string& as instance name and context

Instantiate a new rule engine plugin

extern "C"
irods::pluggable_rule_engine<irods::default_re_ctx>*
    plugin_factory(
        const std::string& _instance_name,
        const std::string& _context) {
            auto re{new irods::pluggable_rule_engine<irods::default_re_ctx>( 
                        _instance_name,
                        _context)};
}
  • Allocate a raw pointer to an irods::pluggable_rule_engine

  • Pass the instance name and context to the ctor

    • attributes of the irods::plugin_base class

Wire the plugin operations

...

re->add_operation<irods::default_re_ctx&, const std::string&>(
        "start",
        std::function<
            irods::error(
                irods::default_re_ctx&,
                const std::string&)>(start));

...
  • Template parameters are the parameters of the function operation

  • First parameter is the calling name of the operation - "start"

  • Second parameter is a std::function wrapping the local function definition

    • Takes the full signature of the function as a template parameter

    • Takes the function pointer as an argument

Similar treatment for the other operations

  • stop

  • rule_exists

  • exec_rule

  • list_rules

  • exec_rule_text

  • exec_rule_expression

re->add_operation<irods::default_re_ctx&, const std::string&>(
            "stop",
            std::function<irods::error(irods::default_re_ctx&, const std::string&)>(stop));
re->add_operation<irods::default_re_ctx&, std::string, bool&>(
            "rule_exists",
            std::function<
                irods::error(irods::default_re_ctx&, std::string, bool&)>(rule_exists));
re->add_operation<irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback>(
            "exec_rule",
            std::function<irods::error(
                irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback)>(
                    exec_rule));
re->add_operation<irods::default_re_ctx&, std::vector<std::string>&>(
            "list_rules",
            std::function<irods::error(irods::default_re_ctx&,std::vector<std::string>&)>( list_rules ) );
re->add_operation<irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback>(
            "exec_rule_text",
            std::function<irods::error(
                irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback)>(
                    exec_rule_text));
re->add_operation<irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback>(
            "exec_rule_expression",
            std::function<irods::error(
                irods::default_re_ctx&, std::string, std::list<boost::any>&, irods::callback)>(
                    exec_rule_expression));

start, stop, rule_exists, and list_rules

irods::error start(irods::default_re_ctx&, const std::string&) {
    return SUCCESS();
}

irods::error stop(irods::default_re_ctx&, const std::string&) {
    return SUCCESS();
}

irods::error rule_exists(irods::default_re_ctx&, std::string _rule_name, bool& _ret) {
    _ret = _rule_name == "pep_resource_resolve_hierarchy_pre";
    return SUCCESS();
}

irods::error list_rules( irods::default_re_ctx&, std::vector<std::string>& rule_vec ) {
     rule_vec.push_back("pep_resource_resolve_hierarchy_pre");
     return SUCCESS();

}
  • start and stop are no-ops

  • rule_exists is simply matching on pep_resource_resolve_hierarchy_pre as that is the only operation supported by the rule engine plugin

  • list_rules simply responds with the single dynamic PEP

exec_rule

irods::error exec_rule(
    irods::default_re_ctx&,
    std::string            _rule_name,
    std::list<boost::any>& _rule_arguments,
    irods::callback        _effect_handler) {
    try {
        auto it_args{std::begin(_rule_arguments)};
        const auto& arg_resource_name{boost::any_cast<std::string&>(*it_args)};
        auto& arg_plugin_context{boost::any_cast<irods::plugin_context&>(*++it_args)};
        auto& arg_out{*boost::any_cast<std::string*>(*++it_args)};
        const auto& arg_operation_type{*boost::any_cast<const std::string*>(*++it_args)};
        const auto& arg_host{*boost::any_cast<const std::string*>(*++it_args)};
        auto& arg_hierarchy_parser{*boost::any_cast<irods::hierarchy_parser*>(*++it_args)};
        auto& arg_vote{*boost::any_cast<float*>(*++it_args)};
  • Begin by extracting all of the arguments from the list

    • Rule arguments are packed into a std::list for every operation ( recall the api plugin call wrappers )

Target specific operations

        if (arg_operation_type != "CREATE") {
            return SUCCESS();
        }

        ruleExecInfo_t& rei{get_rei(_effect_handler)};

        const std::string resource_type{
                              get_resource_type(
                                  arg_resource_name, *rei.rsComm)};
        if (resource_type != "passthru") {
            return SUCCESS();
        }

        const boost::optional<uint64_t> max_bytes{
            get_max_bytes(arg_resource_name, *rei.rsComm)};

 

  • Skip non CREATE operations

  • Fetch the ruleExecInfo_t from the framework

  • Fetch other values via helper functions

Handle max_bytes edge cases

        if (!max_bytes) {
            arg_out = "read=1.0;write=1.0";
            return SUCCESS();
        } else if (*max_bytes == 0) {
            arg_out = "read=1.0;write=0.0";
            return SUCCESS();
        }
  • The context string may not be defined

  • Short circuit if max_bytes is set to 0 explicitly

Compute the write_weight

        const uint64_t bytes_used_by_children{
                           get_bytes_used_by_all_children(
                               arg_resource_name,
                               *rei.rsComm)};
        const uint64_t bytes_required_for_new_data_object{
                           get_bytes_of_incoming_data_object(
                               arg_plugin_context)};
        const uint64_t hypothetical_bytes_used{
                           bytes_used_by_children +
                           bytes_required_for_new_data_object};
        const double percent_used{
             std::max(0.0, std::min(1.0, static_cast<double>
                  (hypothetical_bytes_used) / *max_bytes))};
        const double write_weight{1.0 - percent_used};
  • Fetch bytes used by resource and new data object

  • Compute new total bytes used

  • Compute write_weight given max_bytes

Wrapping up exec_rule

        std::stringstream out_stream;
        out_stream << "read=1.0;write=" << write_weight_string;
        arg_out = out_stream.str();
        return SUCCESS();
    } catch (const irods::exception& e) {
        rodsLog(LOG_ERROR, e.what());
        return ERROR(e.code(), "irods exception in exec_rule");
    }
    return SUCCESS();
}
  • Build the weight string given the computed write weight

  • Set the out variable - arg_out

  • Return SUCCESS() if all goes well

  • Catch the exception, log and return an error otherwise

exec_rule_text and exec_rule_expression

irods::error exec_rule_text(
    irods::default_re_ctx&,
    std::string _rule_text,
    std::list<boost::any>& _rule_arguments,
    irods::callback _effect_handler) {
    return ERROR(SYS_NOT_SUPPORTED, "not supported");
}

irods::error exec_rule_expression(
    irods::default_re_ctx&,
    std::string _rule_text,
    std::list<boost::any>& _rule_arguments,
    irods::callback _effect_handler) {
    return ERROR(SYS_NOT_SUPPORTED, "not supported");
}
  • Both return SYS_NOT_SUPPORTED

  • Another Rule Engine Plugin could pick them up

Building and Installing the Package

sudo apt-get -y install irods-externals-* irods-dev

 

export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH

 

 which cmake

 

mkdir ~/build_storage_cpp

cd ~/build_storage_cpp

cmake ../irods_training/advanced/irods_rule_engine_plugin_storage_balancing/

make package

 

 sudo dpkg -i ./irods_rule_engine_plugin-cpp-storage-balancing_4.2.6~xenial_amd64.deb

As the ubuntu user

Configuring the Rule Engine Plugin

Edit /etc/irods/server_config.json

{
    "instance_name" : "irods_rule_engine_plugin-cpp-storage-balancing-instance",
    "plugin_name" : "irods_rule_engine_plugin-cpp-storage-balancing",
    "plugin_specific_configuration" : {}
},
{
    "instance_name" : "irods_rule_engine_plugin-python-instance",
    "plugin_name" : "irods_rule_engine_plugin-python",
    "plugin_specific_configuration" : {}
},

Run ils to determine if syntax is correct

Testing the Rule Engine Plugin

iput -R def_resc VERSION.json f1
iput -R def_resc VERSION.json f2
iput -R def_resc VERSION.json f3
iput -R def_resc VERSION.json f4


ils -l
/tempZone/home/rods:
  rods              0 def_resc;pt2;ufs2          224 2017-06-05.20:10 & f1
  rods              0 def_resc;pt1;ufs1          224 2017-06-05.20:10 & f2
  rods              0 def_resc;pt2;ufs2          224 2017-06-05.20:10 & f3
  rods              0 def_resc;pt1;ufs1          224 2017-06-05.20:10 & f4

 

Remove all data from def_resc

  • irmtrash - any data in the trashcan will count towards the total being used to determine the distribution of incoming files

Questions?

UGM 2019 - Rule Engine Plugins

By justinkylejames

UGM 2019 - Rule Engine Plugins

iRODS User Group Meeting 2019 - Advanced Training Module

  • 1,184