Advanced Training:

Compute to Data

May 28-31, 2024

iRODS User Group Meeting 2024

Amsterdam, Netherlands

Alan King, Senior Software Developer

Martin Flores, Software Developer

iRODS Consortium

The Compute to Data Use Case

Data is assumed to already be routed to an appropriate storage resource

Goals - Develop generic interface concept for compute

  • Develop a metadata driven interface for labeling resources which provide computational capabilities
    • ultimately relies upon convention
  • Separate configuration from implementation
    • isolate deployment specific concepts
  • Consider a rule base as an extension of iRODS
    • rules are not just data management policy

"Compute To Data" Pattern - Salient Features

Implemented as an iRODS rulebase -

    following the Template Method pattern

  1. If necessary, replicate input data to an appropriate resource
  2. Check permissions
  3. Launch compute container (Docker) :
    1. Process input data via Jupyter notebook
    2.  Save results
  4. Register the resultant directory into iRODS
  5. Apply metadata to newly registered results

Components of the System

System Component

Job Initialization

Container Technology

User Provided Compute

 

Implementation

iRODS Rule Base

Docker

Jupyter Notebook

 

Getting Started

git clone https://github.com/irods/irods_training
sudo apt-get -y install \
   irods-externals-cmake3.21.4-0 \
   irods-externals-clang13.0.1-0 \
   irods-externals-qpid-proton0.36.0-2 \
   irods-externals-fmt8.1.1-1 \
   irods-dev
export PATH=/opt/irods-externals/cmake3.21.4-0/bin:$PATH

Clone irods_training repository and configure build tools

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce
sudo usermod -aG docker ${USER}

Install Docker

Getting Started

cd
mkdir build_compute_to_data
cd build_compute_to_data
cmake ../irods_training/advanced/hpc_compute_to_data
make package
sudo dpkg -i irods-hpc-compute-to-data-example.deb

cd
mkdir build_register_microservice
cd build_register_microservice
cmake ../irods_training/advanced/hpc_compute_to_data/msvc__msiregister_as_admin/
make package
sudo dpkg -i irods-microservice-register_as_admin-4.3.2-ubuntu22-x86_64.deb

Install packages for the compute-to-data example

docker pull irods/irods-training-jupyter-digital-filter

Pull Docker image for processing

Getting Started - Python extensions to iRODS

 sudo apt-get -y install python3-pip 

Also, install python's pip package:

Make sure the Python rule engine plugin is installed.

 sudo apt-get -y install irods-rule-engine-plugin-python
 python3 -m pip install docker==7.0.0 --user
 python3 -m pip install requests==2.31.0 --user

As service account user irods, install the Python Docker API. Manually downgrade requests due to bug in a recent release.

Add irods user to the docker group:

Restart the iRODS server:

 sudo su irods -c '~/irodsctl restart'
 sudo usermod -aG docker irods 

Further Setup and Configuration

Place Python Rule Engine stanza after irods_rule_engine_plugin stanza:

Create /etc/irods/core.py with the following import:

from compute_to_data import *
"rule_engines": [
    {
      "instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
      "plugin_name": "irods_rule_engine_plugin-irods_rule_language",
                 ...
      "shared_memory_instance": "irods_rule_language_rule_engine"
    },
    {
      "instance_name": "irods_rule_engine_plugin-python-instance",
      "plugin_name": "irods_rule_engine_plugin-python",
      "plugin_specific_configuration": {}
    },
                 ...

Continued... Data-to-Compute Set-up / Configuration

iadmin mkuser alice rodsuser 
iadmin moduser alice password apass 

This demonstration will be run as rodsuser 'alice'

Configure the Tagged Resources - if necessary

Make a unixfilesystem resource for the SIGNAL_PROCESSING role.

iadmin mkresc dsp_resc unixfilesystem $(hostname):/tmp/irods/dsp_resc

Annotate it with appropriate metadata given its role

  - defined in the configuration as part of the contract

As the irods service account

imeta add -R dsp_resc COMPUTE_RESOURCE_ROLE SIGNAL_PROCESSING

Finally ...

$ iinit
 ERROR: environment_properties::capture: missing environment file. should be at [/home/ubuntu/.irods/irods_environment.json]
Enter the host name (DNS) of the server to connect to: localhost
Enter the port number [1247]: 1247
Enter your irods user name: alice
Enter your irods zone: tempZone
Enter your current iRODS password:
$ ils
/tempZone/home/alice:

Remember to log in as 'alice' in the ubuntu training account:

The configuration interface

Define interfaces for any necessary conventions

  • Metadata attributes and values
  • Metadata values for implemented roles

 

Single Point of Truth - Template Method Pattern

  • execute defined preconditions
  • run user's requested container

 

Users may utilize metadata conventions within a rule to provide inputs to the generalized container service.

Reminder ...

Implemented as an iRODS rulebase -

    following the Template Method pattern

  1. If necessary, replicate input data to an appropriate resource
  2. Check permissions
  3. Launch compute container (Docker) :
    1. Process input data via Jupyter notebook
    2.  Save results
  4. Register the resultant directory into iRODS
  5. Apply metadata to newly registered results (but not today...)

Separation of Concerns

iRODS Rule
     -> Python Rule
          -> Docker Container
               -> Jupyter Notebook

The iRODS Rule Language Rule File

main {
  container_dispatch("containers.run","/tempZone/home/alice/task_config.json","dsp_resc","","")
}
INPUT null
OUTPUT ruleExecOut

Note - add a delay() directive for asynchronous behavior.

Contents of /home/ubuntu/spawn_remote_containers.r

The Python Rulebase

 

Located at:

/home/ubuntu/irods_training/advanced/hpc_compute_to_data/compute_to_data.py

 

The rulebase:

  - performs pre-flight checks on the input data

  - launches the Docker container (Jupyter notebook)

  - registers the results in iRODS

The Digital Signal Processing container

FROM jupyter/base-notebook
USER root
COPY lpfilter.ipynb /home/jovyan/work/.
COPY mymodule/ /home/jovyan/work/mymodule/
USER jovyan
RUN conda init
RUN conda install -y -c conda-forge matplotlib numpy
RUN jupyter trust /home/jovyan/work/lpfilter.ipynb
RUN mkfifo /tmp/fifo
CMD cat /tmp/fifo

Dockerfile can be found at:

~/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/Dockerfile

The Jupyter Notebook

 

Located at:

/home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/lpfilter.ipynb

 

The notebook:

  - loads an input waveform

  - applies a digital lowpass filter

  - plots 3 graphs of the results

  - saves graphs and filtered data

Compute to Data - Digital Filter Testing

ubuntu $ icd ; imkdir notebook_input notebook_output
ubuntu $ cd ; iput task_config.json
ubuntu $ for x in {1..512}; do echo $((x%24)) ; done >input.dat
ubuntu $ iput input.dat notebook_input
ubuntu $ ils -lr
/tempZone/home/alice:
  alice             0 demoResc          853 2024-06-08.20:27 & task_config.json
  C- /tempZone/home/alice/notebook_input  
/tempZone/home/alice/notebook_input:
  alice             0 demoResc         1318 2024-06-08.20:27 & input.dat
  C- /tempZone/home/alice/notebook_output  
/tempZone/home/alice/notebook_output:
ubuntu $ irule -r irods_rule_engine_plugin-irods_rule_language-instance -F spawn_remote_containers.r
ubuntu $ ils -lr
/tempZone/home/alice:
  alice             0 demoResc          853 2024-06-08.20:27 & task_config.json
  C- /tempZone/home/alice/notebook_input  
/tempZone/home/alice/notebook_input:
  alice             0 demoResc         1318 2024-06-08.20:27 & input.dat
  alice             1 dsp_resc         1318 2024-06-08.20:28 & input.dat
  C- /tempZone/home/alice/notebook_output  
/tempZone/home/alice/notebook_output:
  alice             0 dsp_resc            0 2024-06-08.20:28 & .fe6cc526-063a-11ee-912b-377adb02dba0
  C- /tempZone/home/alice/notebook_output/fe6cc526-063a-11ee-912b-377adb02dba0  
/tempZone/home/alice/notebook_output/fe6cc526-063a-11ee-912b-377adb02dba0:
  alice             0 dsp_resc            0 2024-06-08.20:28 & .fe6cc526-063a-11ee-912b-377adb02dba0
  alice             0 dsp_resc       746805 2024-06-08.20:28 & lowpass_filter_processing.html
  alice             0 dsp_resc         3200 2024-06-08.20:28 & lowpass_filtered_input.dat

Compute to Data - Digital Filter Results

sudo su - irods
cd /tmp/irods/dsp_resc/home/alice/notebook_output
python3 -m http.server 8888

Open in web browser

 

View the html file

Start a simple http server to view the output

Questions?