Workflow Automation
June 25-27, 2019
iRODS User Group Meeting 2019
Utrecht, Netherlands
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
Workflow Automation
The iRODS Rule Language
Rules - C-like scripts executed by the iRODS Rule Engine
Rule Engine - integrated language interpreter
Simple Example:
HelloWorld { writeLine("stdout", "Hello World"); }
Rule Execution
Workflow may be automated in several ways
-
irule - initiated by a user
-
Policy Enforcement Points - initiated automatically
-
Delayed Execution Queue - initiated asynchronously
Microservices
C++ plugins bound into the Rule Engine as functions
Useful for
- leveraging specialized libraries - curl, gdal, etc
- tasks which are computationally intensive
Microservices
Invoked like a function call
HelloWorld2 { msihello_world( *msg ); } INPUT *msg = "my message" OUTPUT ruleExecOut
Installation
Packaged Microservices - installed via native packaging system ( rpm, dpkg )
- Installed into /usr/lib/irods/plugins/microservices
Hand-Built Microservices - copied by hand into the proper plugin directory
- /usr/lib/irods/plugins/microservices
- $IRODS_HOME/plugins/microservices
Note - for Run-In-Place installations $IRODS_HOME will be wherever the system was built and deployed
Deploying Rules for Policy Enforcement Points
Two ways to deploy custom rules:
- Directly edit and add rule code to the PEP defined in the default rule base
/etc/irods/core.re
- Override the Policy Enforcement Point
include a new file within the rule base containing the desired rule code
Note - The rule bases are configured in /etc/irods/server_config.json
The Landing Zone Use Case
- Data staged from instruments to Landing Zone
- Delayed Execution rule harvests metadata
- Data ingested to Long Term Storage
- Data replicated to high performance storage for compute
- Products replicated back to long term storage
- Unused data is trimmed as appropriate
Prepping the Code
sudo apt-get -y install irods-externals-* irods-dev
export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH
which cmake
mkdir ~/build_lz
cd ~/build_lz
cmake ~/irods_training/advanced/landing_zone_microservices/
make package
sudo dpkg -i ./irods-landing-zone-example_4.2.6~xenial_amd64.deb
Prepping the Landing Zone
mkdir -p /tmp/landing_zone/new
mkdir /tmp/landing_zone/processed
Make the temporary directories as 'irods'
sudo su - irods
Switch users to the irods service account
cp ~/irods_training/stickers.jpg /tmp
Stage the data as the ubuntu user
Run the Rule
cp /tmp/stickers.jpg /tmp/landing_zone/new
Stage the data for ingest
irule -F landing_zone.r
Run the rule
processing file=[/tmp/landing_zone/new/stickers.jpg]
gathering metadata for [/tmp/landing_zone/new/stickers.jpg]
image metadata: ImageDepth=8-bit%Width=4032%Height=3024%CompressionType=JPEG%Format=JPEG (Joint Photographic Experts Group JFIF format)%Colorspace=sRGB
processed file=[/tmp/landing_zone/new/stickers.jpg]
move [/tmp/landing_zone/new/stickers.jpg], from [/tmp/landing_zone/new], to [/tmp/landing_zone/processed/]
Check the results
ils -l stickers.jpg
Look for ingested data
imeta ls -d /tempZone/home/rods/stickers.jpg
Check the metadata
AVUs defined for dataObj /tempZone/home/rods/stickers.jpg: attribute: Format value: JPEG (Joint Photographic Experts Group JFIF format) units: ---- attribute: ImageDepth value: 8-bit units:
Rule Review - data detection, ingest, and filtering
ingest_rule { #delay( "<PLUSET>30s</PLUSET><EF>5m REPEAT FOR EVER</EF>" ) { # remote( "resource1.example.org", "null" ) { *err = errorcode( msiget_filepaths_from_glob( *lz_glob, *delay_seconds, *file_age_seconds, *the_files ) ); if( *err == 0 ) { foreach( *f in *the_files ) { writeLine( "stdout", "processing file=[*f]" ); *err = errorcode( msiput_dataobj_or_coll( *f, *resource_name, "forceFlag=", *tgt_coll, *real_path ) ); if( *err==0 ) { if (*f like "*.jpg" || *f like "*.jpeg" || *f like "*.bmp" || *f like "*.tif" || *f like "*.tiff" || *f like "*.rif" || *f like "*.gif" || *f like "*.png" || *f like "*.svg" || *f like "*.xpm") {
Note the commented out delay and remote directives
Rule Review - metadata extraction and assignment
writeLine( "stdout", "gathering metadata for [*f]"); msiget_image_meta(*f, *meta); writeLine("stdout", "image metadata: " ++ *meta); msiString2KeyValPair(*meta, *meta_kvp); msiAssociateKeyValuePairsToObj( *meta_kvp, *real_path, "-d"); } writeLine( "stdout", "processed file=[*f]" ); *err = errorcode( msifilesystem_rename( *f, *lz_raw, *lz_proc ) ); writeLine( "stdout", "move [*f], from [*lz_raw], to [*lz_proc]" ); } # if err } # foreach
Rule Review
} # if err else { writeLine( "stdout", "error in msiget_filepaths_from_glob - *err" ); } # } # remote #} # delay } # ingest_rule INPUT *lz_raw="/tmp/landing_zone/new", *lz_proc="/tmp/landing_zone/processed/", *resource_name="demoResc", *delay_seconds=1, *lz_glob="/tmp/landing_zone/new/*", *tgt_coll="/tempZone/home/rods/", *file_age_seconds=1 OUTPUT ruleExecOut
Note: INPUT should all be on one line
Using user assigned metadata
iquest "select META_DATA_ATTR_NAME where META_DATA_ATTR_VALUE like 'bar'"
We can now use the metadata for a variety of things
foreach(*ROW in SELECT META_DATA_ATTR_NAME WHERE META_DATA_ATTR_VALUE = 'SOME_VALUE') { # science happens here }
Combine with LIGQ to drive rules and workflows
foreach(*ROW in SELECT META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE) { *DATA_TYPE = *ROW.META_DATA_ATTR_VALUE if("DATA_TYPE_1" == *DATA_TYPE) { # fancy things } else { # other things } }
Questions?
UGM 2019 - Workflow Automation
By justinkylejames
UGM 2019 - Workflow Automation
- 987