Advanced Training:
Rule Engine Plugins



May 28-31, 2024
iRODS User Group Meeting 2024
Amsterdam, Netherlands
Alan King, Senior Software Developer
Martin Flores, Software Developer
iRODS Consortium


Anatomy of a Rule Engine Plugin
Represents the last of the core plugin interfaces
Each plugin must define seven operations:
- start
- stop
- rule_exists
- list_rules
- exec_rule
- exec_rule_text
- exec_rule_expression

Configuration of Rule Engine Plugins
"plugin_configuration": {
"rule_engines": [
{
...
},
... ] }
Within /etc/irods/server_config.json -
JSON array holding configuration for the rule engine(s)

Configuration of Rule Engine Plugins
Anatomy of a Rule Engine Plugin JSON object
{
"instance_name": "<UNIQUE_NAME>",
"plugin_name": "<DERIVED_FROM_SHARED_OBJECT>",
"plugin_specific_configuration": {
<ANYTHING_GOES_HERE>
},
"shared_memory_instance": "<UNIQUE_SHARED_MEMORY_NAME>"
}
iRODS Rule Language Configuration
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"core"
],
"regexes_for_supported_peps": [
"ac[^ ]*",
"msi[^ ]*",
"[^ ]*pep_[^ ]*_(pre|post|except|finally)"
]
},
"shared_memory_instance": "irods_rule_language_rule_engine"
},
Basic iRODS Rule Language Example
pep_resource_resolve_hierarchy_pre(*INST_NAME, *CTX, *OUT, *OP_TYPE, *HOST, *RESC_HIER, *VOTE) { if ("CREATE" == *OP_TYPE) { if ("pt1" == *INST_NAME) { *OUT = "read=1.0;write=0.5"; } else if ("pt2" == *INST_NAME) { *OUT = "read=1.0;write=1.0"; } } }
Dynamic Policy Enforcement Points

Questions?

Installing the Python Rule Engine Plugin
sudo apt-get -y install \ irods-rule-engine-plugin-python \ python3-exif
$ ls /usr/lib/irods/plugins/rule_engines/
...
libirods_rule_engine_plugin-python.soAs the ubuntu user, install the plugin and a python module.
See the new shared object.

Python Rule Engine Configuration
As the irods user, create /etc/irods/core.py from packaged template file.
Edit /etc/irods/server_config.json.
- Insert the new python plugin configuration stanza before the iRODS Rule Language plugin
- This allows it to service requests first when no rule engine plugin is specified by the caller
"rule_engines": [
{
"instance_name" : "irods_rule_engine_plugin-python-instance",
"plugin_name" : "irods_rule_engine_plugin-python",
"plugin_specific_configuration" : {}
},
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",cp /etc/irods/core.py.template /etc/irods/core.py

Set up a Custom Rulebase (training.re)
"rule_engines": [
...
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"training",
"core"
],Add a custom rulebase to /etc/irods/server_config.json.

A Combination Use of Both Python and iRODS Rules
add_metadata_to_objpath(*str, *objpath, *objtype) { msiString2KeyValPair(*str, *kvp); msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype); } getSessionVar(*name,*output) { *output = eval("str($"++*name++")"); }
Create /etc/irods/training.re rulebase

A Python Dynamic Policy Enforcement Point
Copy core.py and python_storage_balancing.py into /etc/irods,
overwriting the default core.py, and stage stickers.jpg for the irods user.
sudo cp ~/irods_training/advanced/python_storage_balancing.py /etc/irods/ sudo cp ~/irods_training/advanced/core.py /etc/irods/
sudo cp ~/irods_training/stickers.jpg /var/lib/irods/
def pep_api_data_obj_put_post(rule_args, callback, rei):
import os
data_obj_inp = rule_args[2]
obj_path = str(data_obj_inp.objPath)
resc_hier = str(data_obj_inp.condInput['resc_hier'])
query_condition_string = f'COLL_NAME = \'{os.path.dirname(obj_path)}\' and ' \
f'DATA_NAME = \'{os.path.basename(obj_path)}\' and ' \
f'DATA_RESC_HIER = \'{resc_hier}\''
# Note: The physical path fetched by the query may not exist on the host executing this
# bit of policy. In a real deployment, the policy implementer should consider the hostname
# of the resource on which the data resides and consider using the remote() microservice.
phypath = list(Query(callback, 'DATA_PATH', query_condition_string))[0]
exiflist = []
with open(phypath, 'rb') as f:
tags = exifread.process_file(f, details=False)
for (k, v) in tags.items():
if k not in ('JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote'):
exifpair = '{0}={1}'.format(k, v)
exiflist.append(exifpair)
exifstring = '%'.join(exiflist)
callback.add_metadata_to_objpath(exifstring, obj_path, '-d')
callback.writeLine('serverLog', 'PYTHON - pep_api_data_obj_put_post() complete')The python instantiation of the dynamic PEP now in core.py:

Test our combination of rules
imeta ls -d stickers.jpgConfirm the EXIF metadata was extracted and applied.
irm -f stickers.jpg iput stickers.jpg
Switch to the irods user. Make sure earlier stickers example is removed and put some test data into iRODS.
AVUs defined for dataObj stickers.jpg: attribute: Image Orientation value: Horizontal (normal) units: ---- attribute: EXIF ColorSpace value: sRGB units: ----

Storage Balancing
Create a resource hierarchy for our storage balancing example.
iadmin mkresc def_resc deferred iadmin mkresc ufs1 unixfilesystem $(hostname):/tmp/ufs1 iadmin mkresc ufs2 unixfilesystem $(hostname):/tmp/ufs2 iadmin mkresc pt1 passthru iadmin mkresc pt2 passthru iadmin addchildtoresc def_resc pt1 iadmin addchildtoresc def_resc pt2 iadmin addchildtoresc pt1 ufs1 iadmin addchildtoresc pt2 ufs2 iadmin modresc pt1 context "max_bytes=20000000" iadmin modresc pt2 context "max_bytes=20000000"
$ ilsresc
def_resc:deferred
├── pt1:passthru
│ └── ufs1
└── pt2:passthru
└── ufs2
demoResc: unixfilesystem


Implementing the Storage Balanced Plugin
findRescType(*INST_NAME, *OUT) {
foreach (*ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME') {
*OUT = *ROW.RESC_TYPE_NAME;
}
}
findInstId(*INST_NAME, *OUT) {
foreach (*ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME') {
*OUT = *ROW.RESC_ID;
}
}
findBytesUsed(*INST_ID, *OUT) {
foreach (*ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID') {
*STORAGE_RESC = *ROW1.RESC_NAME;
*TEMP = 0;
foreach (*ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC') {
*TEMP = *TEMP + int(*ROW2.DATA_SIZE);
}
*OUT = "*TEMP";
}
}
findContextString(*INST_NAME, *OUT) {
foreach (*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME') {
*OUT = *ROW.RESC_CONTEXT;
}
}A few helper functions to add to /etc/irods/training.re.
Code can be found at ~/irods_training/advanced/python_storage_balancing.re.

The Python Storage Balancing Rule
def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei):
if rule_args[3] == 'CREATE':
ret = callback.findRescType(rule_args[0], '')
resc_type = ret['arguments'][1]
if (resc_type == 'passthru'):
ret = callback.findInstId(rule_args[0], '')
inst_id = ret['arguments'][1]
ret = callback.findBytesUsed(inst_id, '')
bytes_used = ret['arguments'][1]
ret = callback.findContextString(rule_args[0], '')
context_string = ret['arguments'][1]
max_bytes = -1
max_bytes_index = context_string.find('max_bytes')
if max_bytes_index != -1:
max_bytes_re = 'max_bytes=(\d+)'
max_bytes_search = re.search(max_bytes_re, context_string)
max_bytes_str = max_bytes_search.group(1)
max_bytes = max_bytes_str
percent_full = 0.0
if max_bytes == -1:
percent_full = 0.0
elif max_bytes == 0:
percent_full = 1.0
else:
percent_full = float(bytes_used)/float(max_bytes)
write_weight = 1.0 - percent_full
rule_args[2] = 'read=1.0;write=' + str(write_weight)
Uncomment definition for pep_resource_resolve_hierarchy_pre in /etc/irods/core.py.
#def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei): # return python_storage_balancing.pep_resource_resolve_hierarchy_pre(rule_args, callback, rei)
The definition in /etc/irods/python_storage_balancing.py:

Testing the Storage Balancing Rule
Remove all data from def_resc
irmtrash - any data in the trashcan will count towards the total being used to determine the distribution of incoming files

irmtrash iput -R def_resc version.json f1 iput -R def_resc version.json f2 iput -R def_resc version.json f3 iput -R def_resc version.json f4
$ ils -l
/tempZone/home/rods: rods 0 def_resc;pt2;ufs2 239 2024-05-05.21:52 & f1 rods 0 def_resc;pt1;ufs1 239 2024-05-05.21:52 & f2 rods 0 def_resc;pt2;ufs2 239 2024-05-05.21:52 & f3 rods 0 def_resc;pt1;ufs1 239 2024-05-05.21:53 & f4
Questions?

A Storage Balancing C++ Rule Engine Plugin
Code can be found at
~/irods_training/advanced/irods_rule_engine_plugin_storage_balancing/src/libirods_rule_engine_plugin-cpp-storage-balancing.cpp

sudo apt-get -y install \ irods-dev \ irods-externals-clang13.0.1-0 \ irods-externals-cmake3.21.4-0
As the ubuntu user, install required build tools...
Start with the Factory
extern "C"
irods::pluggable_rule_engine<irods::default_re_ctx>* plugin_factory(const std::string& _instance_name, const std::string& _context) {
auto* re{new irods::pluggable_rule_engine<irods::default_re_ctx>(_instance_name , _context)};
re->add_operation(
"start",
std::function<irods::error(irods::default_re_ctx&, const std::string&)>(start));
re->add_operation(
"stop",
std::function<irods::error(irods::default_re_ctx&, const std::string&)>(stop));
re->add_operation(
"rule_exists",
std::function<irods::error(irods::default_re_ctx&, const std::string&, bool&)>(rule_exists));
re->add_operation(
"list_rules",
std::function<irods::error(irods::default_re_ctx&, std::vector<std::string>&)>(list_rules));
re->add_operation(
"exec_rule",
std::function<irods::error(irods::default_re_ctx&, const std::string&, std::list<boost::any>&, irods::callback)>(exec_rule));
re->add_operation(
"exec_rule_text",
std::function<irods::error(irods::default_re_ctx&, const std::string&, msParamArray_t*, const std::string&, irods::callback)>(exec_rule_text));
re->add_operation(
"exec_rule_expression",
std::function<irods::error(irods::default_re_ctx&, const std::string&, msParamArray_t*, irods::callback)>(exec_rule_expression));
return re;
}
Anatomy of the Plugin Factory
extern "C" irods::pluggable_rule_engine<irods::default_re_ctx>* plugin_factory( const std::string& _instance_name, const std::string& _context)
{ ... }
Must have C linkage
Returns an irods::pluggable_rule_engine<>*
Accepts two const std::string& as instance name and context

Instantiate a new rule engine plugin
extern "C"
irods::pluggable_rule_engine<irods::default_re_ctx>*
plugin_factory(
const std::string& _instance_name,
const std::string& _context)
{
auto* re{new irods::pluggable_rule_engine<irods::default_re_ctx>(
_instance_name,
_context)};
}-
Allocate a raw pointer to an irods::pluggable_rule_engine
-
Pass the instance name and context to the constructor
-
Attributes of the irods::plugin_base class
-

Wire the plugin operations
... re->add_operation( "start", std::function< irods::error(irods::default_re_ctx&, const std::string&)>(start));
...
-
Template parameters are the parameters of the function operation
-
First parameter is the calling name of the operation (e.g. "start")
-
Second parameter is a std::function wrapping the local function definition
-
Takes the full signature of the function as a template parameter
-
Takes the function pointer as an argument
-
-
All operations attached to the instance must be wrapped in an anonymous namespace or be marked as static

Similar treatment for the other operations
stop
rule_exists
exec_rule
list_rules
exec_rule_text
exec_rule_expression
re->add_operation( "stop", std::function<irods::error(irods::default_re_ctx&, const std::string&)>(stop)); re->add_operation( "rule_exists", std::function<irods::error(irods::default_re_ctx&, const std::string&, bool&)>(rule_exists)); re->add_operation( "list_rules", std::function<irods::error(irods::default_re_ctx&, std::vector<std::string>&)>(list_rules)); re->add_operation( "exec_rule", std::function< irods::error(irods::default_re_ctx&, const std::string&, std::list<boost::any>&, irods::callback)>(exec_rule)); re->add_operation( "exec_rule_text", std::function<irods::error( irods::default_re_ctx&, const std::string&, msParamArray_t*, const std::string&, irods::callback)>(exec_rule_text)); re->add_operation(
"exec_rule_expression", std::function<irods::error( irods::default_re_ctx&, const std::string&, msParamArray_t*, irods::callback)>(exec_rule_expression));

start, stop, rule_exists, and list_rules
static irods::error start(irods::default_re_ctx&, const std::string&) {
return SUCCESS();
}
static irods::error stop(irods::default_re_ctx&, const std::string&) {
return SUCCESS();
}
static
irods::error rule_exists(irods::default_re_ctx&, const std::string& _rule_name, bool& _ret) {
_ret = (_rule_name == "pep_resource_resolve_hierarchy_pre");
return SUCCESS();
}
static irods::error list_rules(irods::default_re_ctx&, std::vector<std::string>& _rules) {
_rules.emplace_back("pep_resource_resolve_hierarchy_pre");
return SUCCESS();
}-
start and stop are no-ops
-
rule_exists is simply matching on pep_resource_resolve_hierarchy_pre as that is the only operation supported by the rule engine plugin
-
list_rules simply responds with the single dynamic PEP

exec_rule
static irods::error exec_rule(
irods::default_re_ctx&,
const std::string& _rule_name,
std::list<boost::any>& _rule_arguments,
irods::callback _effect_handler)
{
try {
auto it_args{std::begin(_rule_arguments)};
const auto& arg_resource_name{boost::any_cast<std::string&>(*it_args)};
auto& arg_plugin_context{boost::any_cast<irods::plugin_context&>(*++it_args)};
auto& arg_out{*boost::any_cast<std::string*>(*++it_args)};
const auto& arg_operation_type{*boost::any_cast<const std::string*>(*++it_args)};
const auto& arg_host{*boost::any_cast<const std::string*>(*++it_args)};
auto& arg_hierarchy_parser{*boost::any_cast<irods::hierarchy_parser*>(*++it_args)};
auto& arg_vote{*boost::any_cast<float*>(*++it_args)};-
Begin by extracting all of the arguments from the list
-
Rule arguments are packed into a std::list for every operation
-

Target specific operations
if (arg_operation_type != "CREATE") {
return SUCCESS();
}
ruleExecInfo_t& rei{get_rei(_effect_handler)};
const std::string resource_type{get_resource_type(arg_resource_name, *rei.rsComm)};
if (resource_type != "passthru") {
return SUCCESS();
}
const boost::optional<uint64_t> max_bytes{get_max_bytes(arg_resource_name, *rei.rsComm)};
-
Skip non-CREATE operations
-
Fetch the ruleExecInfo_t from the framework
-
Fetch other values via helper functions

Handle max_bytes edge cases
if (!max_bytes) {
arg_out = "read=1.0;write=1.0";
return SUCCESS();
}
else if (*max_bytes == 0) {
arg_out = "read=1.0;write=0.0";
return SUCCESS();
}The context string may not be defined
Short circuit if max_bytes is set to 0 explicitly

Compute the write_weight
const uint64_t bytes_used_by_children{
get_bytes_used_by_all_children(
arg_resource_name,
*rei.rsComm)};
const uint64_t bytes_required_for_new_data_object{
get_bytes_of_incoming_data_object(
arg_plugin_context)};
const uint64_t hypothetical_bytes_used{
bytes_used_by_children +
bytes_required_for_new_data_object};
const double percent_used{
std::max(0.0, std::min(1.0, static_cast<double>
(hypothetical_bytes_used) / *max_bytes))};
const double write_weight{1.0 - percent_used};
Fetch bytes used by resource and new data object
Compute new total bytes used
Compute write_weight given max_bytes

Wrapping up exec_rule
std::stringstream out_stream;
out_stream << "read=1.0;write=" << write_weight_string;
arg_out = out_stream.str();
return SUCCESS();
} catch (const irods::exception& e) {
rodsLog(LOG_ERROR, e.what());
return ERROR(e.code(), "irods exception in exec_rule");
}
return SUCCESS();
}
Build the weight string given the computed write weight
Set the out variable - arg_out
Return SUCCESS() if all goes well
Catch the exception, log and return an error otherwise

exec_rule_text and exec_rule_expression
static irods::error exec_rule_text(
irods::default_re_ctx&,
const std::string&,
msParamArray_t*,
const std::string&,
irods::callback)
{
return ERROR(SYS_NOT_SUPPORTED, "not supported");
}
static irods::error exec_rule_expression(
irods::default_re_ctx&,
const std::string&,
msParamArray_t*,
irods::callback)
{
return ERROR(SYS_NOT_SUPPORTED, "not supported");
}Both return SYS_NOT_SUPPORTED
Another Rule Engine Plugin could pick them up

Building and Installing the Package
export PATH=/opt/irods-externals/cmake3.21.4-0/bin:$PATH which cmake mkdir ~/build_storage_cpp cd ~/build_storage_cpp cmake ../irods_training/advanced/irods_rule_engine_plugin_storage_balancing/ make package sudo dpkg -i ./irods_rule_engine_plugin-cpp-storage-balancing.deb
As the ubuntu user:

$ ls /usr/lib/irods/plugins/rule_engines/ | grep storage libirods_rule_engine_plugin-cpp-storage-balancing.so
See the newly installed rule engine plugin
Configuring the Rule Engine Plugin
Edit /etc/irods/server_config.json
{ "instance_name": "irods_rule_engine_plugin-cpp-storage-balancing-instance", "plugin_name": "irods_rule_engine_plugin-cpp-storage-balancing", "plugin_specific_configuration": {} }, { "instance_name": "irods_rule_engine_plugin-python-instance", "plugin_name": "irods_rule_engine_plugin-python", "plugin_specific_configuration": {} },
Run ils to determine if syntax is correct

Comment out definition for pep_resource_resolve_hierarchy_pre in /etc/irods/core.py again
#def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei): # return python_storage_balancing.pep_resource_resolve_hierarchy_pre(rule_args, callback, rei)
Testing the Rule Engine Plugin

Remove all data from def_resc
- irm -f skips the trash and directly unlinks the data
irm -f f1 f2 f3 f4 iput -R def_resc version.json f1 iput -R def_resc version.json f2 iput -R def_resc version.json f3 iput -R def_resc version.json f4
$ ils -l
/tempZone/home/rods: rods 0 def_resc;pt2;ufs2 239 2024-05-06.21:52 & f1 rods 0 def_resc;pt1;ufs1 239 2024-05-06.21:52 & f2 rods 0 def_resc;pt2;ufs2 239 2024-05-06.21:52 & f3 rods 0 def_resc;pt1;ufs1 239 2024-05-06.21:53 & f4
Questions?

UGM 2024 - Rule Engine Plugins
By iRODS Consortium
UGM 2024 - Rule Engine Plugins
iRODS User Group Meeting 2024 - Advanced Training Module
- 595