Workflow Automation
June 25-27, 2019
iRODS User Group Meeting 2019
Utrecht, Netherlands
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
Workflow Automation

The iRODS Rule Language
Rules - C-like scripts executed by the iRODS Rule Engine
Rule Engine - integrated language interpreter
Simple Example:
HelloWorld {
    writeLine("stdout", "Hello World");
}

Rule Execution
Workflow may be automated in several ways
- 
	
irule - initiated by a user
 - 
	
Policy Enforcement Points - initiated automatically
 - 
	
Delayed Execution Queue - initiated asynchronously
 

Microservices
C++ plugins bound into the Rule Engine as functions
Useful for
- leveraging specialized libraries - curl, gdal, etc
 - tasks which are computationally intensive
 

Microservices
Invoked like a function call
HelloWorld2 {
    msihello_world( *msg );
}
INPUT *msg = "my message"
OUTPUT ruleExecOut

Installation
Packaged Microservices - installed via native packaging system ( rpm, dpkg )
- Installed into /usr/lib/irods/plugins/microservices
 
Hand-Built Microservices - copied by hand into the proper plugin directory
- /usr/lib/irods/plugins/microservices
 - $IRODS_HOME/plugins/microservices
 
Note - for Run-In-Place installations $IRODS_HOME will be wherever the system was built and deployed

Deploying Rules for Policy Enforcement Points
Two ways to deploy custom rules:
- Directly edit and add rule code to the PEP defined in the default rule base
 
/etc/irods/core.re
- Override the Policy Enforcement Point
 
include a new file within the rule base containing the desired rule code
Note - The rule bases are configured in /etc/irods/server_config.json

The Landing Zone Use Case
- Data staged from instruments to Landing Zone
 - Delayed Execution rule harvests metadata
 - Data ingested to Long Term Storage
 - Data replicated to high performance storage for compute
 - Products replicated back to long term storage
 - Unused data is trimmed as appropriate
 


Prepping the Code
sudo apt-get -y install irods-externals-* irods-dev
export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH
which cmake
mkdir ~/build_lz
cd ~/build_lz
cmake ~/irods_training/advanced/landing_zone_microservices/
make package
sudo dpkg -i ./irods-landing-zone-example_4.2.6~xenial_amd64.deb

Prepping the Landing Zone
mkdir -p /tmp/landing_zone/new
mkdir /tmp/landing_zone/processed
Make the temporary directories as 'irods'
sudo su - irods
Switch users to the irods service account
cp ~/irods_training/stickers.jpg /tmp
Stage the data as the ubuntu user

Run the Rule
cp /tmp/stickers.jpg /tmp/landing_zone/new
Stage the data for ingest
irule -F landing_zone.r
Run the rule
processing file=[/tmp/landing_zone/new/stickers.jpg]
gathering metadata for [/tmp/landing_zone/new/stickers.jpg]
image metadata: ImageDepth=8-bit%Width=4032%Height=3024%CompressionType=JPEG%Format=JPEG (Joint Photographic Experts Group JFIF format)%Colorspace=sRGB
processed file=[/tmp/landing_zone/new/stickers.jpg]
move [/tmp/landing_zone/new/stickers.jpg], from [/tmp/landing_zone/new], to [/tmp/landing_zone/processed/]

Check the results
ils -l stickers.jpg
Look for ingested data
imeta ls -d /tempZone/home/rods/stickers.jpg
Check the metadata
AVUs defined for dataObj /tempZone/home/rods/stickers.jpg: attribute: Format value: JPEG (Joint Photographic Experts Group JFIF format) units: ---- attribute: ImageDepth value: 8-bit units:

Rule Review - data detection, ingest, and filtering
ingest_rule {
    #delay( "<PLUSET>30s</PLUSET><EF>5m REPEAT FOR EVER</EF>" ) {
    #    remote( "resource1.example.org", "null" ) {
            *err = errorcode( msiget_filepaths_from_glob(
                                  *lz_glob, *delay_seconds,
                                  *file_age_seconds, *the_files ) );
            if( *err == 0 ) {
                foreach( *f in *the_files ) {
                    writeLine( "stdout", "processing file=[*f]" );
                    *err = errorcode( msiput_dataobj_or_coll(
                                            *f, *resource_name, "forceFlag=",
                                            *tgt_coll, *real_path ) );
                    if( *err==0 ) {
                        if (*f like "*.jpg"  || *f like "*.jpeg" ||
                            *f like "*.bmp"  || *f like "*.tif"  ||
                            *f like "*.tiff" || *f like "*.rif"  ||
                            *f like "*.gif"  || *f like "*.png"  ||
                            *f like "*.svg"  || *f like "*.xpm") {
Note the commented out delay and remote directives

Rule Review - metadata extraction and assignment
                            writeLine(
                                "stdout",
                                "gathering metadata for [*f]");
                            
                            msiget_image_meta(*f, *meta);
                           
                            writeLine("stdout", "image metadata: " ++ *meta);
                            
                            msiString2KeyValPair(*meta, *meta_kvp);
                            msiAssociateKeyValuePairsToObj(
                                *meta_kvp, *real_path, "-d");
                        }
                        writeLine( "stdout", "processed file=[*f]" );
                        *err = errorcode( msifilesystem_rename(
                                              *f, *lz_raw, *lz_proc ) );
                        writeLine(
                            "stdout",
                            "move [*f], from [*lz_raw], to [*lz_proc]" ); 
                    } # if err
                } # foreach

Rule Review
            } # if err
            else {
                writeLine(
                    "stdout",
                    "error in msiget_filepaths_from_glob - *err" );
            }
    #   } # remote
    #} # delay
} # ingest_rule
INPUT *lz_raw="/tmp/landing_zone/new", *lz_proc="/tmp/landing_zone/processed/",
*resource_name="demoResc",
*delay_seconds=1,
*lz_glob="/tmp/landing_zone/new/*",
*tgt_coll="/tempZone/home/rods/",
*file_age_seconds=1
OUTPUT ruleExecOut
Note: INPUT should all be on one line

Using user assigned metadata
iquest "select META_DATA_ATTR_NAME where META_DATA_ATTR_VALUE like 'bar'"
We can now use the metadata for a variety of things
foreach(*ROW in SELECT META_DATA_ATTR_NAME WHERE META_DATA_ATTR_VALUE = 'SOME_VALUE') {
    # science happens here
}
Combine with LIGQ to drive rules and workflows
foreach(*ROW in SELECT META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE) {
    *DATA_TYPE = *ROW.META_DATA_ATTR_VALUE
    if("DATA_TYPE_1" == *DATA_TYPE) {
        # fancy things
    } else {
        # other things
    }     
}

Questions?

UGM 2019 - Workflow Automation
By justinkylejames
UGM 2019 - Workflow Automation
- 1,206