Workflow Automation

June 25-27, 2019

iRODS User Group Meeting 2019

Utrecht, Netherlands

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

Workflow Automation

The iRODS Rule Language

Rules - C-like scripts executed by the iRODS Rule Engine

 

Rule Engine - integrated language interpreter

 

 

Simple Example:

HelloWorld {
    writeLine("stdout", "Hello World");
}

Rule Execution

Workflow may be automated in several ways

 

  • irule - initiated by a user

  • Policy Enforcement Points - initiated automatically

  • Delayed Execution Queue - initiated asynchronously

Microservices

C++ plugins bound into the Rule Engine as functions

 

Useful for

  • leveraging specialized libraries - curl, gdal, etc
  • tasks which are computationally intensive

Microservices

Invoked like a function call

HelloWorld2 {
    msihello_world( *msg );
}
INPUT *msg = "my message"
OUTPUT ruleExecOut

Installation

Packaged Microservices - installed via native packaging system ( rpm, dpkg )

  • Installed into /usr/lib/irods/plugins/microservices

 

Hand-Built Microservices - copied by hand into the proper plugin directory

  • /usr/lib/irods/plugins/microservices
  • $IRODS_HOME/plugins/microservices

 

Note - for Run-In-Place installations $IRODS_HOME will be wherever the system was built and deployed

Deploying Rules for Policy Enforcement Points

Two ways to deploy custom rules:

 

  • Directly edit and add rule code to the PEP defined in the default rule base

            /etc/irods/core.re

  • Override the Policy Enforcement Point

            include a new file within the rule base containing the desired rule code

 

Note - The rule bases are configured in /etc/irods/server_config.json

The Landing Zone Use Case

  1. Data staged from instruments to Landing Zone
  2. Delayed Execution rule harvests metadata
  3. Data ingested to Long Term Storage
  4. Data replicated to high performance storage for compute
  5. Products replicated back to long term storage
  6. Unused data is trimmed as appropriate

Prepping the Code

sudo apt-get -y install irods-externals-* irods-dev

 

export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH
which cmake

 

mkdir ~/build_lz
cd ~/build_lz
cmake ~/irods_training/advanced/landing_zone_microservices/
make package

 

sudo dpkg -i ./irods-landing-zone-example_4.2.6~xenial_amd64.deb

Prepping the Landing Zone

mkdir -p /tmp/landing_zone/new

mkdir /tmp/landing_zone/processed

Make the temporary directories as 'irods'

sudo su - irods

Switch users to the irods service account

cp ~/irods_training/stickers.jpg /tmp

Stage the data as the ubuntu user

Run the Rule

cp /tmp/stickers.jpg /tmp/landing_zone/new

Stage the data for ingest

irule -F landing_zone.r

Run the rule

processing file=[/tmp/landing_zone/new/stickers.jpg]
gathering metadata for [/tmp/landing_zone/new/stickers.jpg]
image metadata: ImageDepth=8-bit%Width=4032%Height=3024%CompressionType=JPEG%Format=JPEG (Joint Photographic Experts Group JFIF format)%Colorspace=sRGB
processed file=[/tmp/landing_zone/new/stickers.jpg]
move [/tmp/landing_zone/new/stickers.jpg], from [/tmp/landing_zone/new], to [/tmp/landing_zone/processed/]

Check the results

ils -l stickers.jpg

Look for ingested data

imeta ls -d /tempZone/home/rods/stickers.jpg

Check the metadata

AVUs defined for dataObj /tempZone/home/rods/stickers.jpg:
attribute: Format
value: JPEG (Joint Photographic Experts Group JFIF format)
units: 
----
attribute: ImageDepth
value: 8-bit
units: 

Rule Review - data detection, ingest, and filtering

ingest_rule {
    #delay( "<PLUSET>30s</PLUSET><EF>5m REPEAT FOR EVER</EF>" ) {
    #    remote( "resource1.example.org", "null" ) {
            *err = errorcode( msiget_filepaths_from_glob(
                                  *lz_glob, *delay_seconds,
                                  *file_age_seconds, *the_files ) );
            if( *err == 0 ) {
                foreach( *f in *the_files ) {
                    writeLine( "stdout", "processing file=[*f]" );
                    *err = errorcode( msiput_dataobj_or_coll(
                                            *f, *resource_name, "forceFlag=",
                                            *tgt_coll, *real_path ) );
                    if( *err==0 ) {
                        if (*f like "*.jpg"  || *f like "*.jpeg" ||
                            *f like "*.bmp"  || *f like "*.tif"  ||
                            *f like "*.tiff" || *f like "*.rif"  ||
                            *f like "*.gif"  || *f like "*.png"  ||
                            *f like "*.svg"  || *f like "*.xpm") {

 

Note the commented out delay and remote directives

Rule Review - metadata extraction and assignment

                            writeLine(
                                "stdout",
                                "gathering metadata for [*f]");
                            
                            msiget_image_meta(*f, *meta);
                           
                            writeLine("stdout", "image metadata: " ++ *meta);
                            
                            msiString2KeyValPair(*meta, *meta_kvp);

                            msiAssociateKeyValuePairsToObj(
                                *meta_kvp, *real_path, "-d");
                        }

                        writeLine( "stdout", "processed file=[*f]" );
                        *err = errorcode( msifilesystem_rename(
                                              *f, *lz_raw, *lz_proc ) );
                        writeLine(
                            "stdout",
                            "move [*f], from [*lz_raw], to [*lz_proc]" ); 
                    } # if err
                } # foreach

Rule Review

            } # if err
            else {
                writeLine(
                    "stdout",
                    "error in msiget_filepaths_from_glob - *err" );
            }
    #   } # remote
    #} # delay
} # ingest_rule

INPUT *lz_raw="/tmp/landing_zone/new", *lz_proc="/tmp/landing_zone/processed/",
*resource_name="demoResc",
*delay_seconds=1,
*lz_glob="/tmp/landing_zone/new/*",
*tgt_coll="/tempZone/home/rods/",
*file_age_seconds=1
OUTPUT ruleExecOut

Note: INPUT should all be on one line

Using user assigned metadata

iquest "select META_DATA_ATTR_NAME where META_DATA_ATTR_VALUE like 'bar'"

We can now use the metadata for a variety of things

foreach(*ROW in SELECT META_DATA_ATTR_NAME WHERE META_DATA_ATTR_VALUE = 'SOME_VALUE') {
    # science happens here
}

 

Combine with LIGQ to drive rules and workflows

foreach(*ROW in SELECT META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE) {
    *DATA_TYPE = *ROW.META_DATA_ATTR_VALUE
    if("DATA_TYPE_1" == *DATA_TYPE) {
        # fancy things
    } else {
        # other things
    }     
}

Questions?

UGM 2019 - Workflow Automation

By justinkylejames

UGM 2019 - Workflow Automation

  • 922