Taking Data To Compute

June 5-7, 2018

iRODS User Group Meeting 2018

Durham, NC

Daniel Moore

Applications Engineer, iRODS Consortium

Taking Data To Compute

Integrating iRODS with a compute environment

In order of increasing complexity and integration...

 

iRODS as a compute orchestrator

  • Launch a job via irule, or as part of a PEP
  • Implement a Landing Zone for product capture

 

iRODS as part of a compute job script

  • Stage the source data via replication for the application
  • Capture the products and ingest them into iRODS

 

iRODS as part of the compute application

  • Compute application directly leverages the iRODS API to open, read, and write data

The Data to Compute Use Case

Focus on the right side of the picture

iRODS is out of the data path for computation

Goal - Develop generic interface concept for compute

  • Develop a metadata-driven interface to drive path for input data and compute results. Utilize it to
    • push data to the proper storage resource
    • get a name for the host on which to launch compute job(s)

 

  • Separate configuration from implementation
    • Keep deployment specifics in configuration files
    • Keep rule-base, scripts, and modules free of hard-wired values

Goal - Develop a thumbnailing service for iRODS

Interface is through iRODS and SLURM (compute job scheduler):

  • Replicate the data to the compute resource
  • Send a job to the compute scheduler to generate thumbnails
  • Register the thumbnails into the catalog
  • Replicate the thumbnails back to long term storage
  • Trim replicas on compute resource

Components of the System

System Component

Job Scheduler

Job Launching Script

Tools to Execute

Job Endpoint

Implementation

SLURM

bash

Image Magick convert

iRODS Rule Base

(user extension of the iRODS API)

and SLURM prolog / epilog

Getting Started

Installing Image Magick

sudo apt-get update
sudo apt-get -y install imagemagick

Installing the PRC (Python iRODS-Client) module

Installing the python rule engine plugin

sudo apt-get -y install irods-rule-engine-plugin-python
sudo apt-get -y install python-pip
sudo pip install python-irodsclient

Getting Started

Get the irods_training repository

cd
git clone https://github.com/irods/irods_training
sudo apt-get -y install irods-externals-* irods-dev
export PATH=/opt/irods-externals/cmake3.5.2-0/bin/:$PATH
cd
mkdir build_data_to_compute
cd build_data_to_compute
cmake ../irods_training/advanced/hpc_data_to_compute/
make package
sudo dpkg -i ./irods-hpc-data-to-compute-example_4.2.3~xenial_amd64.deb

Build and Install MUNGE and SLURM (job scheduler)

Build and Install the Data to Compute package

cd ~/irods_training/advanced/hpc_data_to_compute/
./ubuntu_16/install_munge_and_slurm.sh

Package Contents

$ dpkg -c ./irods-hpc-data-to-compute-example_4.2.3~xenial_amd64.deb

dpkg -c irods-hpc-data-to-compute-example_4.2.3~xenial_amd64.deb
drwxrwxr-x root/root         0 2018-06-05 04:46 ./etc/
drwxrwxr-x root/root         0 2018-06-05 04:46 ./etc/irods/
-r--r--r-- root/root      1213 2018-06-04 15:39 ./etc/irods/core.py.data_to_compute
-r--r--r-- root/root      1144 2018-06-04 21:28 ./etc/irods/data_to_compute.re 
                     [...some directories...]
drwxrwxr-x root/root         0 2018-06-05 04:46 ./var/lib/irods/compute/
-r--r--r-- root/root         0 2018-06-04 15:37 ./var/lib/irods/compute/__init__.py
-rw------- root/root        59 2018-06-04 15:39 ./var/lib/irods/compute/admin_as_rodsuser.json
-r--r--r-- root/root     11253 2018-06-04 15:39 ./var/lib/irods/compute/common.py
-r--r--r-- root/root       565 2018-06-04 15:37 ./var/lib/irods/compute/job_params.json
-r--r--r-- root/root      2150 2018-06-04 22:14 ./var/lib/irods/compute/util.py
-r--r--r-- root/root      1301 2018-06-05 04:30 ./var/lib/irods/detect_thumbnails.py
drwxrwxr-x root/root         0 2018-06-05 04:46 ./var/lib/irods/msiExecCmd_bin/
-r-xr-xr-x root/root       571 2018-06-04 15:36 ./var/lib/irods/msiExecCmd_bin/convert.SLURM
-r-xr-xr-x root/root       343 2018-06-04 15:36 ./var/lib/irods/msiExecCmd_bin/submit_thumbnail_job.sh
-r--r--r-- root/root       788 2018-06-04 15:39 ./var/lib/irods/rescName_from_kvpair.r
-r--r--r-- root/root      1494 2018-06-05 03:03 ./var/lib/irods/spawn_remote_slurm_jobs.r

Configure the rule engine

As the irods user, add an additional rule base to

 /etc/irods/server_config.json :
"rule_engines": [

    ...

        "re_rulebase_set": [

            "data_to_compute",

            "core"

        ],

    ...

]

(Remember that order matters!)

Add in a small python rule-base:

Also as the irods user, move a new python rule file into place:

irods@icat:~$ cd /etc/irods
irods@icat:~$ test -f core.py || touch core.py
irods@icat:~$ cp -p core.py core.py.SAVE
irods@icat:~$ cp core.py.data_to_compute core.py

Python Rule Engine Configuration (re-ordering)

Edit rule engine order in /etc/irods/server_config.json :

  • insert the python plugin configuration stanza after the iRODS Rule Language plugin
  • if it already exists elsewhere in the config, move (cut/paste) it from that location, but it must occur only once.
  • for this exercise, native rule code must supersede python!
"rule_engines": [
    {
         "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",

                   ...

    },
    {
         "instance_name" : "irods_rule_engine_plugin-python-instance",
         "plugin_name" : "irods_rule_engine_plugin-python",
         "plugin_specific_configuration" : {}
    }, 
​​

Configure the LTS and Image Processing Resources

As the irods user:

 

Make two unix file system resources

iadmin mkresc lts_resc unixfilesystem `hostname`:/tmp/irods/lts_resc
iadmin mkresc img_resc unixfilesystem `hostname`:/tmp/irods/img_resc

Annotate them with appropriate metadata given their roles

  - defined in the configuration as part of the contract

imeta add -R lts_resc COMPUTE_RESOURCE_ROLE LONG_TERM_STORAGE
imeta add -R img_resc COMPUTE_RESOURCE_ROLE IMAGE_PROCESSING
cp ~/irods_training/stickers.jpg /tmp
sudo mkdir -p /tmp/irods/thumbnails
sudo chown -R irods:irods /tmp/irods

As the ubuntu user:

 

Stage data and destination directory for thumbnail creation

The configuration interface

Define interfaces for any necessary conventions

  • Metadata attributes and values

  • Metadata values for implemented roles

  • Interface to job scheduler for launching compute

 

Single Point of Truth - allows for the use of the same 'end-points' for various metadata standards and naming conventions

 

Users may utilize metadata conventions to provide inputs to a given compute job

The configuration interface

For the thumbnail service we will need to

  • Get the metadata attribute string that holds the role

  • Get the tag for an Image Compute resource

  • Get the tag for a Long Term Storage resource

  • Get the logical collection name for thumbnails

  • Get the physical path for a thumbnail

  • Get the name of a thumbnail

  • Get a list of desired thumbnail sizes

Python rule engine allows a cleaner system design

Writing rules in Python means easy access to  functionality and configuration data, both from the iRODS rule base:

import sys
sys.path.insert(0, "/var/lib/irods")
from compute.common import jobParams
def some_python_rule ( rule_args , callback , rei ):
  # ...
  dest_dir = jobParams() ['phys_dir_for_output'] 
  rule_args[0] = dest_dir
  # ...

and from python iRODS client scripts/modules:

def register_replicate_and_trim_thumbnail ( size_string ):
  # ...
  c = get_collection( jobParams()['output_collection'] )

Python rule engine allows a cleaner system design

We can author useful python functions and insert them into the iRODS rule-base. This one is useful for parsing metadata 'KEY=VALUE' style specifications:

def pyParseRoleSpec (rule_args,callback,rei):
    compute_resc_spec = rule_args[0]
    rule_args[1:3]= map( lambda x:x.strip() ,
                         (compute_resc_spec.split('=')+['']) [:2] )

The configuration interface

iRODS Rules file provides interface for job submission

testRule {
  *thumbnail_sizes = "128x128,256x256,512x512,1024x1024"
  *host = "" ;  *resc_name = "" ; *key = "COMPUTE_RESOURCE_ROLE"; *val="IMAGE_PROCESSING"
  get_host_and_resource_name_by_role(*host, *resc_name, *key, *val)
*input_file_name = "stickers" ; *input_file_ext = ".jpg"
  *input_file =  "*input_file_name" ++ "*input_file_ext"
  if ("*host" == "" ) { writeLine ("stdout", "Host for job launch was not found.") }
  else {
    foreach (*x in select DATA_PATH where COLL_NAME = '/tempZone/home/rods' and
             DATA_NAME = '*input_file' and RESC_NAME = '*resc_name') { *src_phy_path = *x.DATA_PATH }
    remote(*host, "") {
      foreach (*size in split (*thumbnail_sizes, ",")) {
        *dst_phy_path = "/tmp/irods/thumbnails/" ++ "*input_file_name" ++ "_thumbnail_" ++ "*size" ++ "*input_file_ext"
        *cmd_opts="/var/lib/irods/msiExecCmd_bin/convert.SLURM -thumbnail *size *src_phy_path *dst_phy_path"
        msiExecCmd("submit_thumbnail_job.sh","*cmd_opts","null","null","null",*OUT }
    )
  }
}

The configuration interface

Abstraction of job submission via shell script

#!/bin/bash

# $1 - executable
# $2 - thumbnail option
# $3 - sizing string
# $4 - source physical path
# $5 - destination physical path

SBATCH_OPTIONS="-o /tmp/slurm-%j.out"

SCRIPT="$1" # assume full path to executable

/usr/local/bin/sbatch $SBATCH_OPTIONS "$SCRIPT" \
    ${2+"$2"} \
    ${3+"$3"} \
    ${4+"$4"} \
    ${5+"$5"} \
    >/dev/null 2>&1

 

Thumbnail Service - testing

irods@icat:~$ find_compute_resc() { iquest %s " select RESC_NAME where META_RESC_ATTR_NAME = \
'COMPUTE_RESOURCE_ROLE' and META_RESC_ATTR_VALUE = '$1' "; }
irods@icat:~$ iput -R $(find_compute_resc IMAGE_PROCESSING) /tmp/stickers.jpg
irods@icat:~$ ils -l
/tempZone/home/rods:
  rods              0 img_resc      2157087 2018-06-05.08:31 & stickers.jpg

irods@icat:~$ irule -F spawn_remote_slurm_jobs.r
irods@icat:~$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                 2     debug convert.    irods  R       0:05      1 icat
                 3     debug convert.    irods  R       0:05      1 icat
                 4     debug convert.    irods  R       0:05      1 icat
                 5     debug convert.    irods  R       0:05      1 icat
irods@icat:~$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
irods@icat:~$ ./detect_thumbnails.py
=== QUERY RESULTS: ===
lts_resc :        /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_1024x1024.jpg
lts_resc :        /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_128x128.jpg
lts_resc :        /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_256x256.jpg
lts_resc :        /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_512x512.jpg

(Wait for the SLURM job queue to be empty: )

As the irods user, position the input data and start the thumbnail jobs:

irods@icat:~$ cd /etc/irods
irods@icat:~$ cp -p core.py.SAVE core.py

Don't forget to replace the old python rule-set, when moving on to another exercise:

We're done!

Extending iRODS with the Rule Engine

  • All rules should be created and tested in user space before being installed as a rule base

  • Rules may be refactored into a microservice plugin

  • Rules may be refactored into a C++ rule engine plugin

  • Rules may be refactored into an API plugin