Advanced Training:
Rule Engine Plugins
May 28-31, 2024
iRODS User Group Meeting 2024
Amsterdam, Netherlands
Alan King, Senior Software Developer
Martin Flores, Software Developer
iRODS Consortium
Anatomy of a Rule Engine Plugin
Represents the last of the core plugin interfaces
Each plugin must define seven operations:
Configuration of Rule Engine Plugins
"plugin_configuration": { "rule_engines": [ { ... },
... ] }
Within /etc/irods/server_config.json -
JSON array holding configuration for the rule engine(s)
Configuration of Rule Engine Plugins
Anatomy of a Rule Engine Plugin JSON object
{ "instance_name": "<UNIQUE_NAME>", "plugin_name": "<DERIVED_FROM_SHARED_OBJECT>", "plugin_specific_configuration": { <ANYTHING_GOES_HERE> }, "shared_memory_instance": "<UNIQUE_SHARED_MEMORY_NAME>" }
iRODS Rule Language Configuration
{ "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance", "plugin_name": "irods_rule_engine_plugin-irods_rule_language", "plugin_specific_configuration": { "re_data_variable_mapping_set": [ "core" ], "re_function_name_mapping_set": [ "core" ], "re_rulebase_set": [ "core" ], "regexes_for_supported_peps": [ "ac[^ ]*", "msi[^ ]*", "[^ ]*pep_[^ ]*_(pre|post|except|finally)" ] }, "shared_memory_instance": "irods_rule_language_rule_engine" },
Basic iRODS Rule Language Example
pep_resource_resolve_hierarchy_pre(*INST_NAME, *CTX, *OUT, *OP_TYPE, *HOST, *RESC_HIER, *VOTE) { if ("CREATE" == *OP_TYPE) { if ("pt1" == *INST_NAME) { *OUT = "read=1.0;write=0.5"; } else if ("pt2" == *INST_NAME) { *OUT = "read=1.0;write=1.0"; } } }
Dynamic Policy Enforcement Points
Questions?
Installing the Python Rule Engine Plugin
sudo apt-get -y install \ irods-rule-engine-plugin-python \ python3-exif
$ ls /usr/lib/irods/plugins/rule_engines/
...
libirods_rule_engine_plugin-python.so
As the ubuntu user, install the plugin and a python module.
See the new shared object.
Python Rule Engine Configuration
As the irods user, create /etc/irods/core.py from packaged template file.
Edit /etc/irods/server_config.json.
"rule_engines": [
{
"instance_name" : "irods_rule_engine_plugin-python-instance",
"plugin_name" : "irods_rule_engine_plugin-python",
"plugin_specific_configuration" : {}
},
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
cp /etc/irods/core.py.template /etc/irods/core.py
Set up a Custom Rulebase (training.re)
"rule_engines": [
...
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
"plugin_specific_configuration": {
"re_data_variable_mapping_set": [
"core"
],
"re_function_name_mapping_set": [
"core"
],
"re_rulebase_set": [
"training",
"core"
],
Add a custom rulebase to /etc/irods/server_config.json.
A Combination Use of Both Python and iRODS Rules
add_metadata_to_objpath(*str, *objpath, *objtype) { msiString2KeyValPair(*str, *kvp); msiAssociateKeyValuePairsToObj(*kvp, *objpath, *objtype); } getSessionVar(*name,*output) { *output = eval("str($"++*name++")"); }
Create /etc/irods/training.re rulebase
A Python Dynamic Policy Enforcement Point
Copy core.py and python_storage_balancing.py into /etc/irods,
overwriting the default core.py, and stage stickers.jpg for the irods user.
sudo cp ~/irods_training/advanced/python_storage_balancing.py /etc/irods/ sudo cp ~/irods_training/advanced/core.py /etc/irods/
sudo cp ~/irods_training/stickers.jpg /var/lib/irods/
def pep_api_data_obj_put_post(rule_args, callback, rei): import os data_obj_inp = rule_args[2] obj_path = str(data_obj_inp.objPath) resc_hier = str(data_obj_inp.condInput['resc_hier']) query_condition_string = f'COLL_NAME = \'{os.path.dirname(obj_path)}\' and ' \ f'DATA_NAME = \'{os.path.basename(obj_path)}\' and ' \ f'DATA_RESC_HIER = \'{resc_hier}\'' # Note: The physical path fetched by the query may not exist on the host executing this # bit of policy. In a real deployment, the policy implementer should consider the hostname # of the resource on which the data resides and consider using the remote() microservice. phypath = list(Query(callback, 'DATA_PATH', query_condition_string))[0] exiflist = [] with open(phypath, 'rb') as f: tags = exifread.process_file(f, details=False) for (k, v) in tags.items(): if k not in ('JPEGThumbnail', 'TIFFThumbnail', 'Filename', 'EXIF MakerNote'): exifpair = '{0}={1}'.format(k, v) exiflist.append(exifpair) exifstring = '%'.join(exiflist) callback.add_metadata_to_objpath(exifstring, obj_path, '-d') callback.writeLine('serverLog', 'PYTHON - pep_api_data_obj_put_post() complete')
The python instantiation of the dynamic PEP now in core.py:
Test our combination of rules
imeta ls -d stickers.jpg
Confirm the EXIF metadata was extracted and applied.
irm -f stickers.jpg iput stickers.jpg
Switch to the irods user. Make sure earlier stickers example is removed and put some test data into iRODS.
AVUs defined for dataObj stickers.jpg: attribute: Image Orientation value: Horizontal (normal) units: ---- attribute: EXIF ColorSpace value: sRGB units: ----
Storage Balancing
Create a resource hierarchy for our storage balancing example.
iadmin mkresc def_resc deferred iadmin mkresc ufs1 unixfilesystem $(hostname):/tmp/ufs1 iadmin mkresc ufs2 unixfilesystem $(hostname):/tmp/ufs2 iadmin mkresc pt1 passthru iadmin mkresc pt2 passthru iadmin addchildtoresc def_resc pt1 iadmin addchildtoresc def_resc pt2 iadmin addchildtoresc pt1 ufs1 iadmin addchildtoresc pt2 ufs2 iadmin modresc pt1 context "max_bytes=20000000" iadmin modresc pt2 context "max_bytes=20000000"
$ ilsresc
def_resc:deferred
├── pt1:passthru
│ └── ufs1
└── pt2:passthru
└── ufs2
demoResc: unixfilesystem
Implementing the Storage Balanced Plugin
findRescType(*INST_NAME, *OUT) { foreach (*ROW in SELECT RESC_TYPE_NAME WHERE RESC_NAME = '*INST_NAME') { *OUT = *ROW.RESC_TYPE_NAME; } } findInstId(*INST_NAME, *OUT) { foreach (*ROW in SELECT RESC_ID WHERE RESC_NAME = '*INST_NAME') { *OUT = *ROW.RESC_ID; } } findBytesUsed(*INST_ID, *OUT) { foreach (*ROW1 in SELECT RESC_NAME WHERE RESC_PARENT = '*INST_ID') { *STORAGE_RESC = *ROW1.RESC_NAME; *TEMP = 0; foreach (*ROW2 in SELECT sum(DATA_SIZE) WHERE RESC_NAME = '*STORAGE_RESC') { *TEMP = *TEMP + int(*ROW2.DATA_SIZE); } *OUT = "*TEMP"; } } findContextString(*INST_NAME, *OUT) { foreach (*ROW in SELECT RESC_CONTEXT WHERE RESC_NAME = '*INST_NAME') { *OUT = *ROW.RESC_CONTEXT; } }
A few helper functions to add to /etc/irods/training.re.
Code can be found at ~/irods_training/advanced/python_storage_balancing.re.
The Python Storage Balancing Rule
def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei): if rule_args[3] == 'CREATE': ret = callback.findRescType(rule_args[0], '') resc_type = ret['arguments'][1] if (resc_type == 'passthru'): ret = callback.findInstId(rule_args[0], '') inst_id = ret['arguments'][1] ret = callback.findBytesUsed(inst_id, '') bytes_used = ret['arguments'][1] ret = callback.findContextString(rule_args[0], '') context_string = ret['arguments'][1] max_bytes = -1 max_bytes_index = context_string.find('max_bytes') if max_bytes_index != -1: max_bytes_re = 'max_bytes=(\d+)' max_bytes_search = re.search(max_bytes_re, context_string) max_bytes_str = max_bytes_search.group(1) max_bytes = max_bytes_str percent_full = 0.0 if max_bytes == -1: percent_full = 0.0 elif max_bytes == 0: percent_full = 1.0 else: percent_full = float(bytes_used)/float(max_bytes) write_weight = 1.0 - percent_full rule_args[2] = 'read=1.0;write=' + str(write_weight)
Uncomment definition for pep_resource_resolve_hierarchy_pre in /etc/irods/core.py.
#def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei): # return python_storage_balancing.pep_resource_resolve_hierarchy_pre(rule_args, callback, rei)
The definition in /etc/irods/python_storage_balancing.py:
Testing the Storage Balancing Rule
Remove all data from def_resc
irmtrash - any data in the trashcan will count towards the total being used to determine the distribution of incoming files
irmtrash iput -R def_resc version.json f1 iput -R def_resc version.json f2 iput -R def_resc version.json f3 iput -R def_resc version.json f4
$ ils -l
/tempZone/home/rods: rods 0 def_resc;pt2;ufs2 239 2024-05-05.21:52 & f1 rods 0 def_resc;pt1;ufs1 239 2024-05-05.21:52 & f2 rods 0 def_resc;pt2;ufs2 239 2024-05-05.21:52 & f3 rods 0 def_resc;pt1;ufs1 239 2024-05-05.21:53 & f4
Questions?
A Storage Balancing C++ Rule Engine Plugin
Code can be found at
~/irods_training/advanced/irods_rule_engine_plugin_storage_balancing/src/libirods_rule_engine_plugin-cpp-storage-balancing.cpp
sudo apt-get -y install \ irods-dev \ irods-externals-clang13.0.1-0 \ irods-externals-cmake3.21.4-0
As the ubuntu user, install required build tools...
Start with the Factory
extern "C" irods::pluggable_rule_engine<irods::default_re_ctx>* plugin_factory(const std::string& _instance_name, const std::string& _context) { auto* re{new irods::pluggable_rule_engine<irods::default_re_ctx>(_instance_name , _context)}; re->add_operation( "start", std::function<irods::error(irods::default_re_ctx&, const std::string&)>(start)); re->add_operation( "stop", std::function<irods::error(irods::default_re_ctx&, const std::string&)>(stop)); re->add_operation( "rule_exists", std::function<irods::error(irods::default_re_ctx&, const std::string&, bool&)>(rule_exists)); re->add_operation( "list_rules", std::function<irods::error(irods::default_re_ctx&, std::vector<std::string>&)>(list_rules)); re->add_operation( "exec_rule", std::function<irods::error(irods::default_re_ctx&, const std::string&, std::list<boost::any>&, irods::callback)>(exec_rule)); re->add_operation( "exec_rule_text", std::function<irods::error(irods::default_re_ctx&, const std::string&, msParamArray_t*, const std::string&, irods::callback)>(exec_rule_text)); re->add_operation( "exec_rule_expression", std::function<irods::error(irods::default_re_ctx&, const std::string&, msParamArray_t*, irods::callback)>(exec_rule_expression)); return re; }
Anatomy of the Plugin Factory
extern "C" irods::pluggable_rule_engine<irods::default_re_ctx>* plugin_factory( const std::string& _instance_name, const std::string& _context)
{ ... }
Must have C linkage
Returns an irods::pluggable_rule_engine<>*
Accepts two const std::string& as instance name and context
Instantiate a new rule engine plugin
extern "C"
irods::pluggable_rule_engine<irods::default_re_ctx>*
plugin_factory(
const std::string& _instance_name,
const std::string& _context)
{
auto* re{new irods::pluggable_rule_engine<irods::default_re_ctx>(
_instance_name,
_context)};
}
Allocate a raw pointer to an irods::pluggable_rule_engine
Pass the instance name and context to the constructor
Attributes of the irods::plugin_base class
Wire the plugin operations
... re->add_operation( "start", std::function< irods::error(irods::default_re_ctx&, const std::string&)>(start));
...
Template parameters are the parameters of the function operation
First parameter is the calling name of the operation (e.g. "start")
Second parameter is a std::function wrapping the local function definition
Takes the full signature of the function as a template parameter
Takes the function pointer as an argument
All operations attached to the instance must be wrapped in an anonymous namespace or be marked as static
Similar treatment for the other operations
stop
rule_exists
exec_rule
list_rules
exec_rule_text
exec_rule_expression
re->add_operation( "stop", std::function<irods::error(irods::default_re_ctx&, const std::string&)>(stop)); re->add_operation( "rule_exists", std::function<irods::error(irods::default_re_ctx&, const std::string&, bool&)>(rule_exists)); re->add_operation( "list_rules", std::function<irods::error(irods::default_re_ctx&, std::vector<std::string>&)>(list_rules)); re->add_operation( "exec_rule", std::function< irods::error(irods::default_re_ctx&, const std::string&, std::list<boost::any>&, irods::callback)>(exec_rule)); re->add_operation( "exec_rule_text", std::function<irods::error( irods::default_re_ctx&, const std::string&, msParamArray_t*, const std::string&, irods::callback)>(exec_rule_text)); re->add_operation(
"exec_rule_expression", std::function<irods::error( irods::default_re_ctx&, const std::string&, msParamArray_t*, irods::callback)>(exec_rule_expression));
start, stop, rule_exists, and list_rules
static irods::error start(irods::default_re_ctx&, const std::string&) { return SUCCESS(); } static irods::error stop(irods::default_re_ctx&, const std::string&) { return SUCCESS(); } static irods::error rule_exists(irods::default_re_ctx&, const std::string& _rule_name, bool& _ret) { _ret = (_rule_name == "pep_resource_resolve_hierarchy_pre"); return SUCCESS(); } static irods::error list_rules(irods::default_re_ctx&, std::vector<std::string>& _rules) { _rules.emplace_back("pep_resource_resolve_hierarchy_pre"); return SUCCESS(); }
start and stop are no-ops
rule_exists is simply matching on pep_resource_resolve_hierarchy_pre as that is the only operation supported by the rule engine plugin
list_rules simply responds with the single dynamic PEP
exec_rule
static irods::error exec_rule(
irods::default_re_ctx&,
const std::string& _rule_name,
std::list<boost::any>& _rule_arguments,
irods::callback _effect_handler)
{
try {
auto it_args{std::begin(_rule_arguments)};
const auto& arg_resource_name{boost::any_cast<std::string&>(*it_args)};
auto& arg_plugin_context{boost::any_cast<irods::plugin_context&>(*++it_args)};
auto& arg_out{*boost::any_cast<std::string*>(*++it_args)};
const auto& arg_operation_type{*boost::any_cast<const std::string*>(*++it_args)};
const auto& arg_host{*boost::any_cast<const std::string*>(*++it_args)};
auto& arg_hierarchy_parser{*boost::any_cast<irods::hierarchy_parser*>(*++it_args)};
auto& arg_vote{*boost::any_cast<float*>(*++it_args)};
Begin by extracting all of the arguments from the list
Rule arguments are packed into a std::list for every operation
Target specific operations
if (arg_operation_type != "CREATE") { return SUCCESS(); } ruleExecInfo_t& rei{get_rei(_effect_handler)}; const std::string resource_type{get_resource_type(arg_resource_name, *rei.rsComm)}; if (resource_type != "passthru") { return SUCCESS(); } const boost::optional<uint64_t> max_bytes{get_max_bytes(arg_resource_name, *rei.rsComm)};
Skip non-CREATE operations
Fetch the ruleExecInfo_t from the framework
Fetch other values via helper functions
Handle max_bytes edge cases
if (!max_bytes) { arg_out = "read=1.0;write=1.0"; return SUCCESS(); } else if (*max_bytes == 0) { arg_out = "read=1.0;write=0.0"; return SUCCESS(); }
The context string may not be defined
Short circuit if max_bytes is set to 0 explicitly
Compute the write_weight
const uint64_t bytes_used_by_children{ get_bytes_used_by_all_children( arg_resource_name, *rei.rsComm)}; const uint64_t bytes_required_for_new_data_object{ get_bytes_of_incoming_data_object( arg_plugin_context)}; const uint64_t hypothetical_bytes_used{ bytes_used_by_children + bytes_required_for_new_data_object}; const double percent_used{ std::max(0.0, std::min(1.0, static_cast<double> (hypothetical_bytes_used) / *max_bytes))}; const double write_weight{1.0 - percent_used};
Fetch bytes used by resource and new data object
Compute new total bytes used
Compute write_weight given max_bytes
Wrapping up exec_rule
std::stringstream out_stream; out_stream << "read=1.0;write=" << write_weight_string; arg_out = out_stream.str(); return SUCCESS(); } catch (const irods::exception& e) { rodsLog(LOG_ERROR, e.what()); return ERROR(e.code(), "irods exception in exec_rule"); } return SUCCESS(); }
Build the weight string given the computed write weight
Set the out variable - arg_out
Return SUCCESS() if all goes well
Catch the exception, log and return an error otherwise
exec_rule_text and exec_rule_expression
static irods::error exec_rule_text( irods::default_re_ctx&, const std::string&, msParamArray_t*, const std::string&, irods::callback) { return ERROR(SYS_NOT_SUPPORTED, "not supported"); } static irods::error exec_rule_expression( irods::default_re_ctx&, const std::string&, msParamArray_t*, irods::callback) { return ERROR(SYS_NOT_SUPPORTED, "not supported"); }
Both return SYS_NOT_SUPPORTED
Another Rule Engine Plugin could pick them up
Building and Installing the Package
export PATH=/opt/irods-externals/cmake3.21.4-0/bin:$PATH which cmake mkdir ~/build_storage_cpp cd ~/build_storage_cpp cmake ../irods_training/advanced/irods_rule_engine_plugin_storage_balancing/ make package sudo dpkg -i ./irods_rule_engine_plugin-cpp-storage-balancing.deb
As the ubuntu user:
$ ls /usr/lib/irods/plugins/rule_engines/ | grep storage libirods_rule_engine_plugin-cpp-storage-balancing.so
See the newly installed rule engine plugin
Configuring the Rule Engine Plugin
Edit /etc/irods/server_config.json
{ "instance_name": "irods_rule_engine_plugin-cpp-storage-balancing-instance", "plugin_name": "irods_rule_engine_plugin-cpp-storage-balancing", "plugin_specific_configuration": {} }, { "instance_name": "irods_rule_engine_plugin-python-instance", "plugin_name": "irods_rule_engine_plugin-python", "plugin_specific_configuration": {} },
Run ils to determine if syntax is correct
Comment out definition for pep_resource_resolve_hierarchy_pre in /etc/irods/core.py again
#def pep_resource_resolve_hierarchy_pre(rule_args, callback, rei): # return python_storage_balancing.pep_resource_resolve_hierarchy_pre(rule_args, callback, rei)
Testing the Rule Engine Plugin
Remove all data from def_resc
irm -f f1 f2 f3 f4 iput -R def_resc version.json f1 iput -R def_resc version.json f2 iput -R def_resc version.json f3 iput -R def_resc version.json f4
$ ils -l
/tempZone/home/rods: rods 0 def_resc;pt2;ufs2 239 2024-05-06.21:52 & f1 rods 0 def_resc;pt1;ufs1 239 2024-05-06.21:52 & f2 rods 0 def_resc;pt2;ufs2 239 2024-05-06.21:52 & f3 rods 0 def_resc;pt1;ufs1 239 2024-05-06.21:53 & f4
Questions?