Policy Training
Taking Compute to Data
Policy Training
Taking Compute to Data
Jason M. Coposky
@jason_coposky
Executive Director, iRODS Consortium
June 25-28, 2019
iRODS User Group Meeting
University of Utrecht, NL
The Compute to Data Use Case
Data is assumed to already be routed to an appropriate storage resource
Goals - Develop generic interface concept for compute
- Develop a metadata driven interface for labeling resources which provide computational capabilities
- ultimately relies upon convention
- Separate configuration from implementation
- isolate deployment specific concepts
- Consider a rule base as an extension of iRODS
- rules are not just data management policy
"Compute To Data" Pattern - Salient Features
Implemented as an iRODS rulebase: following the Template Method pattern
- If necessary, replicate input data to an appropriate resource
- Check permissions
- Launch compute container (Docker) :
- Process input data via Jupyter notebook
- Save results
- Register the resultant directory into iRODS
- Apply metadata to newly registered results
Components of the System
System Component
Job Initialization
Container Technology
User Provided Compute
Implementation
iRODS Rule Base
Docker
Jupyter Notebook
Getting Started
git clone https://github.com/irods/irods_training
sudo apt-get -y install \
irods-externals-cmake3.5.2-0 \
irods-externals-clang3.8-0 \
irods-externals-qpid-with-proton0.34-0 \
irods-dev
export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH
Clone irods_training repository and configure build tools
As the ubuntu user (if necessary)
Getting Started
cd mkdir build_compute_to_data cd build_compute_to_data cmake ../irods_training/advanced/hpc_compute_to_data make package sudo dpkg -i irods-hpc-compute-to-data-example_4.2.6~xenial_amd64.deb cd mkdir build_register_microservice cd build_register_microservice cmake ../irods_training/advanced/hpc_compute_to_data/msvc__msiregister_as_admin/ make package sudo dpkg -i irods-microservice-register_as_admin-4.2.6-ubuntu16-x86_64.deb
Build and Install packages for the compute-to-data example
cd /home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook docker build -t testimages/jupyter-digital-filter .
Build Docker image for processing
Getting Started
Install python's pip package
Make sure the Python rule engine plugin is installed.
sudo apt-get -y install irods-rule-engine-plugin-python
Add irods user to the docker group
sudo apt-get -y install python-pip
sudo usermod -aG docker irods
Getting Started
As the irods user - install the Python Docker API
sudo service irods restart
or
sudo su irods -c '~/irodsctl restart'
Restart the irods server
pip install docker --user
Further Setup and Configuration
Edit /etc/irods/server_config.json
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
...
"shared_memory_instance": "irods_rule_language_rule_engine"
},
{
"instance_name": "irods_rule_engine_plugin-python-instance",
"plugin_name": "irods_rule_engine_plugin-python",
"plugin_specific_configuration": {}
},
. . .
Create /etc/irods/core.py with the following import:
from compute_to_data import *
Configure Python Rule Engine Plugin
Getting Started
iadmin mkuser alice rodsuser
iadmin moduser alice password apass
This demonstration will be run as rodsuser 'alice'
iadmin mkresc lts_resc unixfilesystem `hostname`:/tmp/irods/lts_resc
iadmin mkresc dsp_resc unixfilesystem `hostname`:/tmp/irods/dsp_resc
imeta add -R lts_resc COMPUTE_RESOURCE_ROLE LONG_TERM_STORAGE
imeta add -R dsp_resc COMPUTE_RESOURCE_ROLE SIGNAL_PROCESSING
Create two unixfilesystem resources
Annotate them with appropriate metadata given their roles
This is defined in the configuration as part of the contract
Finally ...
ubuntu$ iinit ERROR: environment_properties::capture: missing environment file. should be at [/home/ubuntu/.irods/irods_environment.json] One or more fields in your iRODS environment file (irods_environment.json) are missing; please enter them. Enter the host name (DNS) of the server to connect to: localhost Enter the port number: 1247 Enter your irods user name: alice Enter your irods zone: tempZone Those values will be added to your environment file (for use by other iCommands) if the login succeeds. Enter your current iRODS password:
ubuntu$ ils
/tempZone/home/alice:
ubuntu$
Remember to log in as 'alice' in the ubuntu training account:
The configuration interface
Define interfaces for any necessary conventions
- Metadata attributes and values
- Metadata values for implemented roles
Single Point of Truth - Template Method Pattern
- execute defined preconditions
- run user's requested container
Users may utilize metadata conventions within a rule to provide inputs to the generalized container service.
Reminder ...
Implemented as an iRODS rulebase: following the Template Method pattern
- If necessary, replicate input data to an appropriate resource
- Check permissions
- Launch compute container (Docker) :
- Process input data via Jupyter notebook
- Save results
- Register the resultant directory into iRODS
- Apply metadata to newly registered results (but not today...)
The iRODS Rule Language Rule File
main { container_dispatch("containers.run","/tempZone/home/alice/task_config.json","dsp_resc","","") } INPUT null OUTPUT ruleExecOut
irule provies a user-land entry point for the invocation of the Compute to Data Policy
/home/ubuntu/spawn_remote_containers.r
Task Configuration
{
"container": {
"type": "docker",
"image": "testimages/jupyter-digital-filter",
"command": [ "jupyter", "nbconvert",
"--execute",
"--to", "html",
"--output", "/outputs/lowpass_filter_processing.html",
"/home/jovyan/work/lpfilter.ipynb"
],
"environment": {
"INPUT_FILE_PATH" : "/inputs/%(INPUT_FILE_BASENAME)s",
"CUTOFF_FREQUENCY_INDEX" : "0",
"OUTPUT_FILE_PATH" : "/outputs/lowpass_filtered_%(INPUT_FILE_BASENAME)s"
}
},
"external": {
"src_collection" : "/tempZone/home/alice/notebook_input",
"dst_collection" : "/tempZone/home/alice/notebook_output"
},
"internal": {
"src_directory": "/inputs",
"dst_directory": "/outputs"
}
}
Task Configuration
INPUT_FILE_BASENAME : internally computed value derived from first input file found in input collection
type : 'docker' or 'singularity'
image : reference name for repository
command : command, args, for, command
environment : configuration passed through to container
external : logical iRODS source and destination paths for data
internal : paths mapped into docker to local storage from iRODS physical paths on the target resource
The Digital Signal Processing container
FROM jupyter/base-notebook ARG irods_gid=999 ENV IRODS_GID ${irods_gid} USER root RUN apt-get update && apt-get install -y vim less RUN groupadd -g $IRODS_GID irods && usermod -aG irods jovyan RUN sed -i "s/jovyan:x:[0-9]*:[0-9]*\(.*\)/jovyan:x:999:999\1/" /etc/passwd ADD lpfilter.ipynb /home/jovyan/work/. COPY mymodule/ /home/jovyan/work/mymodule/ RUN chown jovyan.users /home/jovyan/work/lpfilter.ipynb COPY mymodule/ /home/jovyan/work/mymodule RUN chown -R jovyan.users /home/jovyan/work/mymodule RUN chown -R 999:999 /home/jovyan && chown -R 999:999 /opt/conda USER jovyan RUN conda init RUN conda install -y -c conda-forge matplotlib numpy RUN jupyter trust /home/jovyan/work/lpfilter.ipynb CMD [ '/bin/bash' ]
~/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/Dockerfile
The Jupyter Notebook
/home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/lpfilter.ipynb
Located in the training repository at:
The notebook:
- Loads an input waveform
- Applies a digital low pass filter
- Plots 3 graphs of the results
- Saves the graphs and filtered data
Compute to Data - Digital Filter Testing
ubuntu $ icd ; imkdir notebook_input notebook_output
ubuntu $ cd ; iput task_config.json
ubuntu $ for x in {1..512}; do echo $((x%24)) ; done >input.dat
ubuntu $ iput input.dat notebook_input
ubuntu $ ils -lr
/tempZone/home/alice:
alice 0 demoResc 853 2019-06-21.16:05 & task_config.json
C- /tempZone/home/alice/notebook_input
/tempZone/home/alice/notebook_input:
alice 0 demoResc 1318 2019-06-21.16:05 & input.dat
C- /tempZone/home/alice/notebook_output
/tempZone/home/alice/notebook_output:
Compute to Data - Digital Filter Testing
ubuntu $ irule -F spawn_remote_containers.r
ubuntu $ ils -lr
/tempZone/home/alice:
alice 0 demoResc 853 2019-06-21.16:05 & task_config.json
C- /tempZone/home/alice/notebook_input
/tempZone/home/alice/notebook_input:
alice 0 demoResc 1318 2019-06-21.16:05 & input.dat
alice 1 dsp_resc 1318 2019-06-21.16:06 & input.dat
C- /tempZone/home/alice/notebook_output
/tempZone/home/alice/notebook_output:
alice 0 dsp_resc 0 2019-06-21.16:06 & .8d63a286-943e-11e9-8013-12cc2f55e24c
C- /tempZone/home/alice/notebook_output/8d63a286-943e-11e9-8013-12cc2f55e24c
/tempZone/home/alice/notebook_output/8d63a286-943e-11e9-8013-12cc2f55e24c:
alice 0 dsp_resc 0 2019-06-21.16:06 & .8d63a286-943e-11e9-8013-12cc2f55e24c
alice 0 dsp_resc 3200 2019-06-21.16:06 & lowpass_filtered_input.dat
alice 0 dsp_resc 359430 2019-06-21.16:06 & lowpass_filter_processing.html
Compute to Data - Digital Filter Results
sudo su - irods
cd /tmp/irods/dsp_resc/home/alice/notebook_output
python -m SimpleHTTPServer 8080
Navigate to HTML file under notebook_output
Compute to Data - Digital Filter Results
picture here of results
Thank you
Any Questions?
UGM 2019 - Taking Compute to Data
By jason coposky
UGM 2019 - Taking Compute to Data
iRODS User Group Meeting 2019 - Policy Training Module
- 1,447