Taking Data To Compute
June 5-7, 2018
iRODS User Group Meeting 2018
Durham, NC
Daniel Moore
Applications Engineer, iRODS Consortium
Taking Data To Compute
Integrating iRODS with a compute environment
In order of increasing complexity and integration...
iRODS as a compute orchestrator
iRODS as part of a compute job script
iRODS as part of the compute application
The Data to Compute Use Case
Focus on the right side of the picture
iRODS is out of the data path for computation
Goal - Develop generic interface concept for compute
Goal - Develop a thumbnailing service for iRODS
Interface is through iRODS and SLURM (compute job scheduler):
Components of the System
System Component
Job Scheduler
Job Launching Script
Tools to Execute
Job Endpoint
Implementation
SLURM
bash
Image Magick convert
iRODS Rule Base
(user extension of the iRODS API)
and SLURM prolog / epilog
Getting Started
Installing Image Magick
sudo apt-get update
sudo apt-get -y install imagemagick
Installing the PRC (Python iRODS-Client) module
Installing the python rule engine plugin
sudo apt-get -y install irods-rule-engine-plugin-python
sudo apt-get -y install python-pip
sudo pip install python-irodsclient
Getting Started
Get the irods_training repository
cd git clone https://github.com/irods/irods_training sudo apt-get -y install irods-externals-* irods-dev export PATH=/opt/irods-externals/cmake3.5.2-0/bin/:$PATH
cd mkdir build_data_to_compute cd build_data_to_compute cmake ../irods_training/advanced/hpc_data_to_compute/ make package sudo dpkg -i ./irods-hpc-data-to-compute-example_4.2.3~xenial_amd64.deb
Build and Install MUNGE and SLURM (job scheduler)
Build and Install the Data to Compute package
cd ~/irods_training/advanced/hpc_data_to_compute/ ./ubuntu_16/install_munge_and_slurm.sh
Package Contents
$ dpkg -c ./irods-hpc-data-to-compute-example_4.2.3~xenial_amd64.deb
dpkg -c irods-hpc-data-to-compute-example_4.2.3~xenial_amd64.deb
drwxrwxr-x root/root 0 2018-06-05 04:46 ./etc/
drwxrwxr-x root/root 0 2018-06-05 04:46 ./etc/irods/
-r--r--r-- root/root 1213 2018-06-04 15:39 ./etc/irods/core.py.data_to_compute
-r--r--r-- root/root 1144 2018-06-04 21:28 ./etc/irods/data_to_compute.re
[...some directories...]
drwxrwxr-x root/root 0 2018-06-05 04:46 ./var/lib/irods/compute/
-r--r--r-- root/root 0 2018-06-04 15:37 ./var/lib/irods/compute/__init__.py
-rw------- root/root 59 2018-06-04 15:39 ./var/lib/irods/compute/admin_as_rodsuser.json
-r--r--r-- root/root 11253 2018-06-04 15:39 ./var/lib/irods/compute/common.py
-r--r--r-- root/root 565 2018-06-04 15:37 ./var/lib/irods/compute/job_params.json
-r--r--r-- root/root 2150 2018-06-04 22:14 ./var/lib/irods/compute/util.py
-r--r--r-- root/root 1301 2018-06-05 04:30 ./var/lib/irods/detect_thumbnails.py
drwxrwxr-x root/root 0 2018-06-05 04:46 ./var/lib/irods/msiExecCmd_bin/
-r-xr-xr-x root/root 571 2018-06-04 15:36 ./var/lib/irods/msiExecCmd_bin/convert.SLURM
-r-xr-xr-x root/root 343 2018-06-04 15:36 ./var/lib/irods/msiExecCmd_bin/submit_thumbnail_job.sh
-r--r--r-- root/root 788 2018-06-04 15:39 ./var/lib/irods/rescName_from_kvpair.r
-r--r--r-- root/root 1494 2018-06-05 03:03 ./var/lib/irods/spawn_remote_slurm_jobs.r
Configure the rule engine
As the irods user, add an additional rule base to
/etc/irods/server_config.json :
"rule_engines": [
...
"re_rulebase_set": [
"data_to_compute",
"core"
],
...
]
(Remember that order matters!)
Add in a small python rule-base:
Also as the irods user, move a new python rule file into place:
irods@icat:~$ cd /etc/irods
irods@icat:~$ test -f core.py || touch core.py
irods@icat:~$ cp -p core.py core.py.SAVE
irods@icat:~$ cp core.py.data_to_compute core.py
Python Rule Engine Configuration (re-ordering)
Edit rule engine order in /etc/irods/server_config.json :
"rule_engines": [
{ "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
...
},
{
"instance_name" : "irods_rule_engine_plugin-python-instance",
"plugin_name" : "irods_rule_engine_plugin-python",
"plugin_specific_configuration" : {}
},
Configure the LTS and Image Processing Resources
As the irods user:
Make two unix file system resources
iadmin mkresc lts_resc unixfilesystem `hostname`:/tmp/irods/lts_resc iadmin mkresc img_resc unixfilesystem `hostname`:/tmp/irods/img_resc
Annotate them with appropriate metadata given their roles
- defined in the configuration as part of the contract
imeta add -R lts_resc COMPUTE_RESOURCE_ROLE LONG_TERM_STORAGE imeta add -R img_resc COMPUTE_RESOURCE_ROLE IMAGE_PROCESSING
cp ~/irods_training/stickers.jpg /tmp
sudo mkdir -p /tmp/irods/thumbnails
sudo chown -R irods:irods /tmp/irods
As the ubuntu user:
Stage data and destination directory for thumbnail creation
The configuration interface
Define interfaces for any necessary conventions
Metadata attributes and values
Metadata values for implemented roles
Interface to job scheduler for launching compute
Single Point of Truth - allows for the use of the same 'end-points' for various metadata standards and naming conventions
Users may utilize metadata conventions to provide inputs to a given compute job
The configuration interface
For the thumbnail service we will need to
Get the metadata attribute string that holds the role
Get the tag for an Image Compute resource
Get the tag for a Long Term Storage resource
Get the logical collection name for thumbnails
Get the physical path for a thumbnail
Get the name of a thumbnail
Get a list of desired thumbnail sizes
Python rule engine allows a cleaner system design
Writing rules in Python means easy access to functionality and configuration data, both from the iRODS rule base:
import sys
sys.path.insert(0, "/var/lib/irods")
from compute.common import jobParams
def some_python_rule ( rule_args , callback , rei ):
# ...
dest_dir = jobParams() ['phys_dir_for_output']
rule_args[0] = dest_dir
# ...
and from python iRODS client scripts/modules:
def register_replicate_and_trim_thumbnail ( size_string ):
# ...
c = get_collection( jobParams()['output_collection'] )
Python rule engine allows a cleaner system design
We can author useful python functions and insert them into the iRODS rule-base. This one is useful for parsing metadata 'KEY=VALUE' style specifications:
def pyParseRoleSpec (rule_args,callback,rei): compute_resc_spec = rule_args[0] rule_args[1:3]= map( lambda x:x.strip() , (compute_resc_spec.split('=')+['']) [:2] )
The configuration interface
iRODS Rules file provides interface for job submission
testRule { *thumbnail_sizes = "128x128,256x256,512x512,1024x1024" *host = "" ; *resc_name = "" ; *key = "COMPUTE_RESOURCE_ROLE"; *val="IMAGE_PROCESSING" get_host_and_resource_name_by_role(*host, *resc_name, *key, *val) *input_file_name = "stickers" ; *input_file_ext = ".jpg" *input_file = "*input_file_name" ++ "*input_file_ext" if ("*host" == "" ) { writeLine ("stdout", "Host for job launch was not found.") } else { foreach (*x in select DATA_PATH where COLL_NAME = '/tempZone/home/rods' and DATA_NAME = '*input_file' and RESC_NAME = '*resc_name') { *src_phy_path = *x.DATA_PATH } remote(*host, "") { foreach (*size in split (*thumbnail_sizes, ",")) { *dst_phy_path = "/tmp/irods/thumbnails/" ++ "*input_file_name" ++ "_thumbnail_" ++ "*size" ++ "*input_file_ext" *cmd_opts="/var/lib/irods/msiExecCmd_bin/convert.SLURM -thumbnail *size *src_phy_path *dst_phy_path" msiExecCmd("submit_thumbnail_job.sh","*cmd_opts","null","null","null",*OUT } ) } }
The configuration interface
Abstraction of job submission via shell script
#!/bin/bash
# $1 - executable
# $2 - thumbnail option
# $3 - sizing string
# $4 - source physical path
# $5 - destination physical path
SBATCH_OPTIONS="-o /tmp/slurm-%j.out"
SCRIPT="$1" # assume full path to executable
/usr/local/bin/sbatch $SBATCH_OPTIONS "$SCRIPT" \
${2+"$2"} \
${3+"$3"} \
${4+"$4"} \
${5+"$5"} \
>/dev/null 2>&1
Thumbnail Service - testing
irods@icat:~$ find_compute_resc() { iquest %s " select RESC_NAME where META_RESC_ATTR_NAME = \ 'COMPUTE_RESOURCE_ROLE' and META_RESC_ATTR_VALUE = '$1' "; }
irods@icat:~$ iput -R $(find_compute_resc IMAGE_PROCESSING) /tmp/stickers.jpg irods@icat:~$ ils -l /tempZone/home/rods: rods 0 img_resc 2157087 2018-06-05.08:31 & stickers.jpg irods@icat:~$ irule -F spawn_remote_slurm_jobs.r irods@icat:~$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2 debug convert. irods R 0:05 1 icat 3 debug convert. irods R 0:05 1 icat 4 debug convert. irods R 0:05 1 icat 5 debug convert. irods R 0:05 1 icat
irods@icat:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
irods@icat:~$ ./detect_thumbnails.py === QUERY RESULTS: === lts_resc : /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_1024x1024.jpg lts_resc : /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_128x128.jpg lts_resc : /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_256x256.jpg lts_resc : /tempZone/home/rods/stickers_thumbnails/stickers_thumbnail_512x512.jpg
(Wait for the SLURM job queue to be empty: )
As the irods user, position the input data and start the thumbnail jobs:
irods@icat:~$ cd /etc/irods
irods@icat:~$ cp -p core.py.SAVE core.py
Don't forget to replace the old python rule-set, when moving on to another exercise:
We're done!
Extending iRODS with the Rule Engine
All rules should be created and tested in user space before being installed as a rule base
Rules may be refactored into a microservice plugin
Rules may be refactored into a C++ rule engine plugin
Rules may be refactored into an API plugin