Policy Training

Taking Compute to Data

Policy Training

Taking Compute to Data

Jason M. Coposky

@jason_coposky

Executive Director, iRODS Consortium

June 25-28, 2019

iRODS User Group Meeting

University of Utrecht, NL

The Compute to Data Use Case

Data is assumed to already be routed to an appropriate storage resource

Goals - Develop generic interface concept for compute

  • Develop a metadata driven interface for labeling resources which provide computational capabilities
    • ultimately relies upon convention
  • Separate configuration from implementation
    • isolate deployment specific concepts
  • Consider a rule base as an extension of iRODS
    • rules are not just data management policy

"Compute To Data" Pattern - Salient Features

Implemented as an iRODS rulebase: following the Template Method pattern

  1. If necessary, replicate input data to an appropriate resource
  2. Check permissions
  3. Launch compute container (Docker) :
    1. Process input data via Jupyter notebook
    2.  Save results
  4. Register the resultant directory into iRODS
  5. Apply metadata to newly registered results

Components of the System

System Component

Job Initialization

Container Technology

User Provided Compute

Implementation

iRODS Rule Base

Docker

Jupyter Notebook

Getting Started

git clone https://github.com/irods/irods_training
sudo apt-get -y install \
   irods-externals-cmake3.5.2-0 \
   irods-externals-clang3.8-0 \
   irods-externals-qpid-with-proton0.34-0 \
   irods-dev
export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH

Clone irods_training repository and configure build tools

As the ubuntu user (if necessary)

Getting Started

cd
mkdir build_compute_to_data
cd build_compute_to_data
cmake ../irods_training/advanced/hpc_compute_to_data
make package
sudo dpkg -i irods-hpc-compute-to-data-example_4.2.6~xenial_amd64.deb

cd
mkdir build_register_microservice
cd build_register_microservice
cmake ../irods_training/advanced/hpc_compute_to_data/msvc__msiregister_as_admin/
make package
sudo dpkg -i irods-microservice-register_as_admin-4.2.6-ubuntu16-x86_64.deb

Build and Install packages for the compute-to-data example

cd /home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook

docker build -t testimages/jupyter-digital-filter .

Build Docker image for processing

Getting Started

Install python's pip package

Make sure the Python rule engine plugin is installed.

sudo apt-get -y install irods-rule-engine-plugin-python

Add irods user to the docker group

sudo apt-get -y install python-pip

sudo usermod -aG docker irods

Getting Started

As the irods user - install the Python Docker API

sudo service irods restart

or

sudo su irods -c '~/irodsctl restart'

Restart the irods server

pip install docker --user

Further Setup and Configuration

Edit /etc/irods/server_config.json

"rule_engines": [
    {
      "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
      "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
                 ...
      "shared_memory_instance": "irods_rule_language_rule_engine"
    },
    {
      "instance_name": "irods_rule_engine_plugin-python-instance",
      "plugin_name": "irods_rule_engine_plugin-python",
      "plugin_specific_configuration": {}
    },

         . . .

Create /etc/irods/core.py with the following  import:

from compute_to_data import *

Configure Python Rule Engine Plugin

Getting Started

iadmin mkuser alice rodsuser
iadmin moduser alice password apass 

This demonstration will be run as rodsuser 'alice'

iadmin mkresc lts_resc unixfilesystem `hostname`:/tmp/irods/lts_resc
iadmin mkresc dsp_resc unixfilesystem `hostname`:/tmp/irods/dsp_resc

imeta add -R lts_resc COMPUTE_RESOURCE_ROLE LONG_TERM_STORAGE
imeta add -R dsp_resc COMPUTE_RESOURCE_ROLE SIGNAL_PROCESSING

Create two unixfilesystem resources

Annotate them with appropriate metadata given their roles

This is defined in the configuration as part of the contract

Finally ...

ubuntu$ iinit
 ERROR: environment_properties::capture: missing environment file. should be at [/home/ubuntu/.irods/irods_environment.json]
One or more fields in your iRODS environment file (irods_environment.json) are
missing; please enter them.
Enter the host name (DNS) of the server to connect to: localhost
Enter the port number: 1247
Enter your irods user name: alice
Enter your irods zone: tempZone
Those values will be added to your environment file (for use by
other iCommands) if the login succeeds.

Enter your current iRODS password:
ubuntu$ ils
/tempZone/home/alice:
ubuntu$

Remember to log in as 'alice' in the ubuntu training account:

The configuration interface

Define interfaces for any necessary conventions

  • Metadata attributes and values
  • Metadata values for implemented roles

 

Single Point of Truth - Template Method Pattern

  • execute defined preconditions
  • run user's requested container

 

Users may utilize metadata conventions within a rule to provide inputs to the generalized container service.

Reminder ...

Implemented as an iRODS rulebase: following the Template Method pattern

  1. If necessary, replicate input data to an appropriate resource
  2. Check permissions
  3. Launch compute container (Docker) :
    1. Process input data via Jupyter notebook
    2.  Save results
  4. Register the resultant directory into iRODS
  5. Apply metadata to newly registered results (but not today...)

The iRODS Rule Language Rule File

main {
  container_dispatch("containers.run","/tempZone/home/alice/task_config.json","dsp_resc","","")
}
INPUT null
OUTPUT ruleExecOut

irule provies a user-land entry point for the invocation of the Compute to Data Policy

/home/ubuntu/spawn_remote_containers.r

Task Configuration

{
    "container": {
        "type": "docker",
        "image": "testimages/jupyter-digital-filter",
        "command": [ "jupyter", "nbconvert",
                    "--execute",
                    "--to", "html",
                    "--output", "/outputs/lowpass_filter_processing.html",
                    "/home/jovyan/work/lpfilter.ipynb"
         ],
        "environment": {
            "INPUT_FILE_PATH" : "/inputs/%(INPUT_FILE_BASENAME)s",
            "CUTOFF_FREQUENCY_INDEX" : "0",
            "OUTPUT_FILE_PATH" : "/outputs/lowpass_filtered_%(INPUT_FILE_BASENAME)s"
        }
    },
    "external": {
        "src_collection" : "/tempZone/home/alice/notebook_input",
        "dst_collection" : "/tempZone/home/alice/notebook_output"
    },
    "internal": {
        "src_directory": "/inputs",
        "dst_directory": "/outputs"
    }
}

Task Configuration

INPUT_FILE_BASENAME : internally computed value derived from first input file found in input collection

type : 'docker' or 'singularity'

image : reference name for repository

command : command, args, for, command

environment : configuration passed through to container

external : logical iRODS source and destination paths for data

internal : paths mapped into docker to local storage from iRODS physical paths on the target resource

The Digital Signal Processing container

FROM jupyter/base-notebook
ARG  irods_gid=999
ENV  IRODS_GID ${irods_gid}
USER root
RUN apt-get update && apt-get install -y vim less
RUN groupadd -g $IRODS_GID irods && usermod -aG irods jovyan
RUN sed -i "s/jovyan:x:[0-9]*:[0-9]*\(.*\)/jovyan:x:999:999\1/" /etc/passwd
ADD lpfilter.ipynb /home/jovyan/work/.
COPY mymodule/ /home/jovyan/work/mymodule/
RUN chown jovyan.users /home/jovyan/work/lpfilter.ipynb
COPY mymodule/ /home/jovyan/work/mymodule
RUN chown -R jovyan.users /home/jovyan/work/mymodule
RUN chown -R 999:999 /home/jovyan && chown -R 999:999 /opt/conda
USER jovyan
RUN conda init
RUN conda install -y -c conda-forge matplotlib numpy
RUN jupyter trust /home/jovyan/work/lpfilter.ipynb
CMD [ '/bin/bash' ]

~/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/Dockerfile

The Jupyter Notebook

/home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/lpfilter.ipynb

Located in the training repository at:

The notebook:

  • Loads an input waveform
  • Applies a digital low pass filter
  • Plots 3 graphs of the results
  • Saves the graphs and filtered data

Compute to Data - Digital Filter Testing

ubuntu $ icd ; imkdir notebook_input notebook_output
ubuntu $ cd ; iput task_config.json
ubuntu $ for x in {1..512}; do echo $((x%24)) ; done >input.dat
ubuntu $ iput input.dat notebook_input
ubuntu $ ils -lr
/tempZone/home/alice:
alice             0 demoResc          853 2019-06-21.16:05 & task_config.json
C- /tempZone/home/alice/notebook_input
/tempZone/home/alice/notebook_input:
alice             0 demoResc         1318 2019-06-21.16:05 & input.dat
C- /tempZone/home/alice/notebook_output
/tempZone/home/alice/notebook_output:

Compute to Data - Digital Filter Testing

ubuntu $ irule -F spawn_remote_containers.r
ubuntu $ ils -lr
/tempZone/home/alice:
alice             0 demoResc          853 2019-06-21.16:05 & task_config.json
C- /tempZone/home/alice/notebook_input
/tempZone/home/alice/notebook_input:
alice             0 demoResc         1318 2019-06-21.16:05 & input.dat
alice             1 dsp_resc         1318 2019-06-21.16:06 & input.dat
C- /tempZone/home/alice/notebook_output
/tempZone/home/alice/notebook_output:
alice             0 dsp_resc            0 2019-06-21.16:06 & .8d63a286-943e-11e9-8013-12cc2f55e24c
C- /tempZone/home/alice/notebook_output/8d63a286-943e-11e9-8013-12cc2f55e24c
/tempZone/home/alice/notebook_output/8d63a286-943e-11e9-8013-12cc2f55e24c:
alice             0 dsp_resc            0 2019-06-21.16:06 & .8d63a286-943e-11e9-8013-12cc2f55e24c
alice             0 dsp_resc         3200 2019-06-21.16:06 & lowpass_filtered_input.dat
alice             0 dsp_resc       359430 2019-06-21.16:06 & lowpass_filter_processing.html

Compute to Data - Digital Filter Results

sudo su - irods

cd /tmp/irods/dsp_resc/home/alice/notebook_output

python -m SimpleHTTPServer 8080

Navigate to HTML file under notebook_output

Compute to Data - Digital Filter Results

picture here of results

Thank you

Any Questions?

UGM 2019 - Taking Compute to Data

By jason coposky

UGM 2019 - Taking Compute to Data

iRODS User Group Meeting 2019 - Policy Training Module

  • 1,423