Compute to Data
January 14-16 2020
CINES
Montpellier, France
Jason Coposky
@jason_coposky
Executive Director, iRODS Consortium
Compute to Data
The Compute to Data Use Case
Data is assumed to already be routed to an appropriate storage resource
Goals - Develop generic interface concept for compute
"Compute To Data" Pattern - Salient Features
Implemented as an iRODS rulebase -
following the Template Method pattern
Components of the System
System Component
Job Initialization
Container Technology
User Provided Compute
Implementation
iRODS Rule Base
Docker
Jupyter Notebook
Getting Started
git clone https://github.com/irods/irods_training
sudo apt-get -y install \
irods-externals-cmake3.5.2-0 \
irods-externals-clang3.8-0 \
irods-externals-qpid-with-proton0.34-0 \
irods-dev
export PATH=/opt/irods-externals/cmake3.5.2-0/bin:$PATH
Clone irods_training repository and configure build tools
If necessary:
Getting Started
cd mkdir build_compute_to_data cd build_compute_to_data cmake ../irods_training/advanced/hpc_compute_to_data make package sudo dpkg -i irods-hpc-compute-to-data-example_4.2.6~xenial_amd64.deb cd mkdir build_register_microservice cd build_register_microservice cmake ../irods_training/advanced/hpc_compute_to_data/msvc__msiregister_as_admin/ make package sudo dpkg -i irods-microservice-register_as_admin-4.2.6-ubuntu16-x86_64.deb
Install packages for the compute-to-data example
cd /home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook docker build -t testimages/jupyter-digital-filter .
Build Docker image for processing
Getting Started - Python extensions to iRODS
sudo apt-get -y install python-pip
Also, install python's pip package:
Make sure the Python rule engine plugin is installed.
sudo apt-get -y install irods-rule-engine-plugin-python
pip install docker --user
As service account user irods, install the Python Docker API
sudo usermod -aG docker irods
Add irods user to the docker group:
(You might have to restart the irods server)
$ sudo service irods restart or: $ sudo su irods -c '~/irodsctl restart'
Further Setup and Configuration
Place Python Rule Engine stanza after native RE stanza:
sudo nano /etc/irods/server_config.json
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
...
"shared_memory_instance": "irods_rule_language_rule_engine"
},
{
"instance_name": "irods_rule_engine_plugin-python-instance",
"plugin_name": "irods_rule_engine_plugin-python",
"plugin_specific_configuration": {}
},
. . .
Create /etc/irods/core.py with the following import:
from compute_to_data import *
Continued... Data-to-Compute Set-up / Configuration
iadmin mkuser alice rodsuser iadmin moduser alice password apass
This demonstration will be run as rodsuser 'alice'
Configure the Tagged Resources - if necessary
Make two unix file system resources
iadmin mkresc lts_resc unixfilesystem `hostname`:/tmp/irods/lts_resc iadmin mkresc dsp_resc unixfilesystem `hostname`:/tmp/irods/dsp_resc
Annotate them with appropriate metadata given their roles
- defined in the configuration as part of the contract
As the irods service account
imeta add -R lts_resc COMPUTE_RESOURCE_ROLE LONG_TERM_STORAGE imeta add -R dsp_resc COMPUTE_RESOURCE_ROLE SIGNAL_PROCESSING
Finally ...
ubuntu$ iinit ERROR: environment_properties::capture: missing environment file. should be at [/home/ubuntu/.irods/irods_environment.json] One or more fields in your iRODS environment file (irods_environment.json) are missing; please enter them. Enter the host name (DNS) of the server to connect to: localhost Enter the port number: 1247 Enter your irods user name: alice Enter your irods zone: tempZone Those values will be added to your environment file (for use by other iCommands) if the login succeeds. Enter your current iRODS password:
ubuntu$ ils
/tempZone/home/alice:
ubuntu$
Remember to log in as 'alice' in the ubuntu training account:
The configuration interface
Define interfaces for any necessary conventions
Single Point of Truth - Template Method Pattern
Users may utilize metadata conventions within a rule to provide inputs to the generalized container service.
Reminder ...
Implemented as an iRODS rulebase -
following the Template Method pattern
Separation of Concerns
iRODS Rule -> Python Rule -> Docker Container -> Jupyter Notebook
The iRODS Rule Language Rule File
main { container_dispatch("containers.run","/tempZone/home/alice/task_config.json","dsp_resc","","") } INPUT null OUTPUT ruleExecOut
Note - add a delay() directive for asynchronous behavior.
Contents of /home/ubuntu/spawn_remote_containers.r
The Python Rulebase
Located at:
/home/ubuntu/irods_training/advanced/hpc_compute_to_data/compute_to_data.py
The Digital Signal Processing container
FROM jupyter/base-notebook ARG irods_gid=999 ENV IRODS_GID ${irods_gid} USER root RUN apt-get update && apt-get install -y vim less RUN groupadd -g $IRODS_GID irods && usermod -aG irods jovyan RUN sed -i "s/jovyan:x:[0-9]*:[0-9]*\(.*\)/jovyan:x:999:999\1/" /etc/passwd ADD lpfilter.ipynb /home/jovyan/work/. COPY mymodule/ /home/jovyan/work/mymodule/ RUN chown jovyan.users /home/jovyan/work/lpfilter.ipynb COPY mymodule/ /home/jovyan/work/mymodule RUN chown -R jovyan.users /home/jovyan/work/mymodule RUN chown -R 999:999 /home/jovyan && chown -R 999:999 /opt/conda USER jovyan RUN conda init RUN conda install -y -c conda-forge matplotlib numpy RUN jupyter trust /home/jovyan/work/lpfilter.ipynb CMD [ '/bin/bash' ]
The Jupyter Notebook
Located at : /home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/lpfilter.ipynb
The notebook:
- loads an input waveform
- applies a digital lowpass filter
- plots 3 graphs of the results
- saves graphs and filtered data
Compute to Data - Digital Filter Testing
ubuntu $ icd ; imkdir notebook_input notebook_output ubuntu $ cd ; iput task_config.json ubuntu $ for x in {1..512}; do echo $((x%24)) ; done >input.dat ubuntu $ iput input.dat notebook_input ubuntu $ ils -lr /tempZone/home/alice: alice 0 demoResc 853 2019-06-21.16:05 & task_config.json C- /tempZone/home/alice/notebook_input /tempZone/home/alice/notebook_input: alice 0 demoResc 1318 2019-06-21.16:05 & input.dat C- /tempZone/home/alice/notebook_output /tempZone/home/alice/notebook_output: ubuntu $ irule -F spawn_remote_containers.r ubuntu $ ils -lr /tempZone/home/alice: alice 0 demoResc 853 2019-06-21.16:05 & task_config.json C- /tempZone/home/alice/notebook_input /tempZone/home/alice/notebook_input: alice 0 demoResc 1318 2019-06-21.16:05 & input.dat alice 1 dsp_resc 1318 2019-06-21.16:06 & input.dat C- /tempZone/home/alice/notebook_output /tempZone/home/alice/notebook_output: alice 0 dsp_resc 0 2019-06-21.16:06 & .8d63a286-943e-11e9-8013-12cc2f55e24c C- /tempZone/home/alice/notebook_output/8d63a286-943e-11e9-8013-12cc2f55e24c /tempZone/home/alice/notebook_output/8d63a286-943e-11e9-8013-12cc2f55e24c: alice 0 dsp_resc 0 2019-06-21.16:06 & .8d63a286-943e-11e9-8013-12cc2f55e24c alice 0 dsp_resc 3200 2019-06-21.16:06 & lowpass_filtered_input.dat alice 0 dsp_resc 359430 2019-06-21.16:06 & lowpass_filter_processing.html
Compute to Data - Digital Filter Results
sudo su - irods
cd /tmp/irods/dsp_resc/home/alice/notebook_output
python -m SimpleHTTPServer 8080
Navigate to notebook_output
View the html file
Start a simple http server to view the output
Thank you
Any Questions?