Advanced Training:
Compute to Data
June 17-20, 2025
iRODS User Group Meeting 2025
Durham, NC
The Compute to Data Use Case
Data is assumed to already be routed to an appropriate storage resource
Goals - Develop generic interface concept for compute
"Compute To Data" Pattern - Salient Features
Implemented as an iRODS rulebase -
following the Template Method pattern
Components of the System
System Component
Job Initialization
Container Technology
User Provided Compute
Implementation
iRODS Rule Base
Docker
Jupyter Notebook
Getting Started
git clone https://github.com/irods/irods_training sudo apt install -y irods-dev cmake
Clone irods_training repository and configure build tools
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt update sudo apt install -y docker-ce sudo usermod -aG docker ${USER}
Install Docker and add the ubuntu user to the docker group.
You may need to exit and re-enter the shell.
Getting Started
cd mkdir build_compute_to_data cd build_compute_to_data cmake ../irods_training/advanced/hpc_compute_to_data make package sudo dpkg -i irods-hpc-compute-to-data-example.deb cd mkdir build_register_microservice cd build_register_microservice cmake ../irods_training/advanced/hpc_compute_to_data/msvc__msiregister_as_admin/ make package sudo dpkg -i irods-microservice-register_as_admin*.deb
Install packages for the compute-to-data example
docker pull irods/irods-training-jupyter-digital-filter
Pull Docker image for processing
Getting Started - Python extensions to iRODS
sudo apt install -y python3-pip
Also, install python's pip package:
Make sure the Python rule engine plugin is installed.
sudo apt install -y irods-rule-engine-plugin-python
sudo mv /usr/lib/python3.12/EXTERNALLY-MANAGED \ /usr/lib/python3.12/EXTERNALLY-MANAGED.moved python3 -m pip install docker==7.0.0
Configure system to allow system-wide pip installations and install the Python Docker API.
Add irods user to the docker group:
sudo usermod -aG docker irods
Add irods user to the docker group:
sudo usermod -aG docker irods
Restart the iRODS server:
sudo su - irods -c 'kill $(cat /var/run/irods/irods-server.pid)' sudo su - irods -c 'irodsServer -d'
Getting Started - Python extensions to iRODS
sudo apt install -y python3-pip
Also, install python's pip package:
Make sure the Python rule engine plugin is installed.
sudo apt install -y irods-rule-engine-plugin-python
sudo su - irods -c "python3 -m pip install docker==7.1.0"
Configure system to allow system-wide pip installations and install the Python Docker API.
Add irods user to the docker group:
sudo usermod -aG docker irods
Add irods user to the docker group:
sudo usermod -aG docker irods
Restart the iRODS server:
sudo su - irods -c 'kill $(cat /var/run/irods/irods-server.pid)' sudo su - irods -c 'irodsServer -d'
Further Setup and Configuration
Place Python Rule Engine stanza after irods_rule_engine_plugin stanza:
As the irods user, create /etc/irods/core.py with the following import:
from compute_to_data import *
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
...
"shared_memory_instance": "irods_rule_language_rule_engine"
},
{
"instance_name": "irods_rule_engine_plugin-python-instance",
"plugin_name": "irods_rule_engine_plugin-python",
"plugin_specific_configuration": {}
},
...
Continued... Data-to-Compute Set-up / Configuration
iadmin mkuser alice rodsuser iadmin moduser alice password apass
As the irods user:
This next demonstration will be run as the newly created rodsuser 'alice'
Configure the Tagged Resources - if necessary
Make a unixfilesystem resource for the SIGNAL_PROCESSING role.
iadmin mkresc dsp_resc unixfilesystem $(hostname -f):/tmp/irods/dsp_resc
Annotate it with appropriate metadata given its role
- defined in the configuration as part of the contract
As the irods service account
imeta add -R dsp_resc COMPUTE_RESOURCE_ROLE SIGNAL_PROCESSING
Finally ...
$ iinit ERROR: environment_properties::capture: missing environment file. should be at [/home/ubuntu/.irods/irods_environment.json] Enter the host name (DNS) of the server to connect to: localhost Enter the port number [1247]: 1247 Enter your irods user name: alice Enter your irods zone: tempZone Connecting as alice#tempZone to localhost:1247 ... Enter your current iRODS password:
$ ils
/tempZone/home/alice:
Authenticate as iRODS user "alice" with the ubuntu Linux user.
kill -HUP $(cat /var/run/irods/irods-server.pid)
As the irods user, reload the iRODS server configuration.
The configuration interface
Define interfaces for any necessary conventions
Single Point of Truth - Template Method Pattern
Users may utilize metadata conventions within a rule to provide inputs to the generalized container service.
Reminder ...
Implemented as an iRODS rulebase -
following the Template Method pattern
Separation of Concerns
iRODS Rule -> Python Rule -> Docker Container -> Jupyter Notebook
The iRODS Rule Language Rule File
main { container_dispatch("containers.run","/tempZone/home/alice/task_config.json","dsp_resc","","") } INPUT null OUTPUT ruleExecOut
Note - add a delay() directive for asynchronous behavior.
Contents of /home/ubuntu/spawn_remote_containers.r
The Python Rulebase
Located at:
/home/ubuntu/irods_training/advanced/hpc_compute_to_data/compute_to_data.py
The rulebase:
- performs pre-flight checks on the input data
- launches the Docker container (Jupyter notebook)
- registers the results in iRODS
The Digital Signal Processing container
FROM jupyter/base-notebook USER root COPY lpfilter.ipynb /home/jovyan/work/. COPY mymodule/ /home/jovyan/work/mymodule/ USER jovyan RUN conda init RUN conda install -y -c conda-forge matplotlib numpy RUN jupyter trust /home/jovyan/work/lpfilter.ipynb RUN mkfifo /tmp/fifo CMD cat /tmp/fifo
Dockerfile can be found at:
~/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/Dockerfile
The Jupyter Notebook
Located at:
/home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/lpfilter.ipynb
The notebook:
- loads an input waveform
- applies a digital lowpass filter
- plots 3 graphs of the results
- saves graphs and filtered data
Compute to Data - Digital Filter Testing
ubuntu $ icd ; imkdir notebook_input notebook_output ubuntu $ cd ; iput task_config.json ubuntu $ for x in {1..512}; do echo $((x%24)) ; done >input.dat ubuntu $ iput input.dat notebook_input ubuntu $ ils -lr /tempZone/home/alice: alice 0 demoResc 1665 2025-05-21.20:41 & task_config.json C- /tempZone/home/alice/notebook_input /tempZone/home/alice/notebook_input: alice 0 demoResc 1318 2025-05-21.20:41 & input.dat C- /tempZone/home/alice/notebook_output /tempZone/home/alice/notebook_output: ubuntu $ irule -r irods_rule_engine_plugin-irods_rule_language-instance -F spawn_remote_containers.r ubuntu $ ils -lr /tempZone/home/alice: alice 0 demoResc 853 2025-05-21.20:41 & task_config.json C- /tempZone/home/alice/notebook_input /tempZone/home/alice/notebook_input: alice 0 demoResc 1318 2025-05-21.20:41 & input.dat alice 1 dsp_resc 1318 2025-05-21.20:42 & input.dat C- /tempZone/home/alice/notebook_output /tempZone/home/alice/notebook_output: alice 0 dsp_resc 0 2025-05-21.20:42 & .fe6cc526-063a-11ee-912b-377adb02dba0 C- /tempZone/home/alice/notebook_output/fe6cc526-063a-11ee-912b-377adb02dba0 /tempZone/home/alice/notebook_output/fe6cc526-063a-11ee-912b-377adb02dba0: alice 0 dsp_resc 0 2025-05-21.20:42 & .fe6cc526-063a-11ee-912b-377adb02dba0 alice 0 dsp_resc 746805 2025-05-21.20:42 & lowpass_filter_processing.html alice 0 dsp_resc 3200 2025-05-21.20:42 & lowpass_filtered_input.dat
Compute to Data - Digital Filter Results
sudo su - irods cd /tmp/irods/dsp_resc/home/alice/notebook_output python3 -m http.server 8888
Open in web browser
View the html file
Start a simple http server to view the output
Questions?