Workflow Automation

June 13, 2016

Public Health England

London, England

Jason M. Coposky

@jason_coposky

Interim Executive Director

The iRODS Rule Language

Rules - C-like scripts executed by the iRODS Rule Engine

 

Rule Engine - integrated language interpreter

 

 

Simple Example:

HelloWorld {
    writeLine("stdout", "Hello World");
}

Rule Execution

Workflow may be automated in several ways

 

  • irule - initiated by a user

  • Policy Enforcement Points - initiated automatically

  • Delayed Execution Queue - initiated asynchronously

Microservices

C++ plugins bound into the Rule Engine as functions

 

Useful for

  • leveraging specialized libraries - curl, gdal, etc
  • tasks which are computationally intensive

Microservices

Invoked like a function call

HelloWorld2 {
    msihello_world( *msg );
}
INPUT *msg = "my message"
OUTPUT ruleExecOut

Installation

Packaged Microservices - installed via native packaging system ( rpm, dpkg )

  • Installed into /var/lib/irods/plugins/microservices

 

Hand-Built Microservices - copied by hand into the proper plugin directory

  • /var/lib/irods/plugins/microservices
  • $IRODS_HOME/plugins/microservices

 

Note - for Run-In-Place installations $IRODS_HOME will be wherever the system was built and deployed

Deploying Rules for Policy Enforcement Points

Two ways to deploy custom rules:

 

  • Directly edit and add rule code to the PEP defined in the default rule base

            /etc/irods/core.re

  • Override the Policy Enforcement Point

            include a new file within the rule base containing the desired rule code

 

Note - The rule bases are configured in /etc/irods/server_config.json

The Landing Zone Use Case

  1. Data staged from instruments to Landing Zone
  2. Delayed Execution rule harvests metadata
  3. Data ingested to Long Term Storage
  4. Data replicated to high performance storage for compute
  5. Products replicated back to long term storage
  6. Unused data is trimmed as appropriate

Prepping the Code

sudo apt-get -y install irods-externals-* irods-dev

 

export PATH=/opt/irods-externals/cmake3.5.2-0/bin:/opt/irods-externals/clang3.8-0/bin:$PATH

 

 which clang++

 which cmake

 

mkdir ~/build_lz

cd ~/build_lz

cmake ~/contrib/microservices/landing_zone_microservices/

make package

 

sudo dpkg -i ./irods-landing-zone-example_4.2.0~trusty_amd64.deb

Prepping the Landing Zone

mkdir -p /tmp/landing_zone/new

mkdir /tmp/landing_zone/processed

wget ftp://ftp.renci.org/pub/irods/training/stickers.jpg

cp stickers.jpg /tmp/landing_zone/new

Make the temporary directories

Fetch the data and stage it for ingest

irule -F landing_zone.r

Run the rule

Check the results

ils -l stickers.jpg

Look for ingested data

imeta ls -d /tempZone/home/rods/stickers.jpg

Check the metadata

Rule Review - data detection, ingest, and filtering

ingest_rule {
    #delay( "<PLUSET>30s</PLUSET><EF>5m REPEAT FOR EVER</EF>" ) {
    #    remote( "resource1.example.org", "null" ) {
            *err = errorcode( msiget_filepaths_from_glob(
                                  *lz_glob, *delay_seconds,
                                  *file_age_seconds, *the_files ) );
            if( *err == 0 ) {
                foreach( *f in *the_files ) {
                    writeLine( "stdout", "processing file=[*f]" );
                    *err = errorcode( msiput_dataobj_or_coll(
                                            *f, *resource_name, "forceFlag=",
                                            *tgt_coll, *real_path ) );
                    if( *err==0 ) {
                        if (*f like "*.jpg"  || *f like "*.jpeg" ||
                            *f like "*.bmp"  || *f like "*.tif"  ||
                            *f like "*.tiff" || *f like "*.rif"  ||
                            *f like "*.gif"  || *f like "*.png"  ||
                            *f like "*.svg"  || *f like "*.xpm") {

 

Note the commented out delay and remote directives

Rule Review - metadata extraction and assignment

                            writeLine(
                                "stdout",
                                "gathering metadata for [*f]");
                            
                            msiget_image_meta(*f, *meta);
                           
                            writeLine("stdout", "image metadata: " ++ *meta);
                            
                            msiString2KeyValPair(*meta, *meta_kvp);

                            msiAssociateKeyValuePairsToObj(
                                *meta_kvp, *real_path, "-d");
                        }

                        writeLine( "stdout", "processed file=[*f]" );
                        *err = errorcode( msifilesystem_rename(
                                              *f, *lz_raw, *lz_proc ) );
                        writeLine(
                            "stdout",
                            "move [*f], from [*lz_raw], to [*lz_proc]" ); 
                    } # if err
                } # foreach

Rule Review

            } # if err
            else {
                writeLine(
                    "stdout",
                    "error in msiget_filepaths_from_glob - *err" );
            }
    #   } # remote
    #} # delay
} # ingest_rule

INPUT *lz_raw="/tmp/landing_zone/new", *lz_proc="/tmp/landing_zone/processed/",
*resource_name="demoResc",
*delay_seconds=1,
*lz_glob="/tmp/landing_zone/new/*",
*tgt_coll="/tempZone/home/rods/",
*file_age_seconds=1
OUTPUT ruleExecOut

Note: INPUT should all be on one line

Using user assigned metadata

iquest "select META_DATA_ATTR_NAME where META_DATA_ATTR_VALUE like 'bar'"

We can now use the metadata for a variety of things

foreach(*ROW in SELECT META_DATA_ATTR_NAME WHERE META_DATA_ATTR_VALUE = 'SOME_VALUE') {
    # science happens here
}

 

Combine with LIGQ to drive rules and workflows

foreach(*ROW in SELECT META_DATA_ATTR_NAME, META_DATA_ATTR_VALUE) {
    *DATA_TYPE = *ROW.META_DATA_ATTR_VALUE
    if("DATA_TYPE_1" == *DATA_TYPE) {
        # fancy things
    } else {
        # other things
    }     
}

Questions?

Public Health England - Workflow automation

By jason coposky

Public Health England - Workflow automation

  • 1,878