Advanced Training:
Compute to Data




June 17-20, 2025
iRODS User Group Meeting 2025
Durham, NC

The Compute to Data Use Case

Data is assumed to already be routed to an appropriate storage resource

Goals - Develop generic interface concept for compute
- Develop a metadata driven interface for labeling resources which provide computational capabilities
- ultimately relies upon convention
- Separate configuration from implementation
- isolate deployment specific concepts
- Consider a rule base as an extension of iRODS
- rules are not just data management policy

"Compute To Data" Pattern - Salient Features
Implemented as an iRODS rulebase -
following the Template Method pattern
- If necessary, replicate input data to an appropriate resource
- Check permissions
- Launch compute container (Docker) :
- Process input data via Jupyter notebook
- Save results
- Register the resultant directory into iRODS
- Apply metadata to newly registered results

Components of the System
System Component
Job Initialization
Container Technology
User Provided Compute
Implementation
iRODS Rule Base
Docker
Jupyter Notebook

Getting Started
git clone https://github.com/irods/irods_training sudo apt install -y irods-dev cmake
Clone irods_training repository and configure build tools

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt update sudo apt install -y docker-ce sudo usermod -aG docker ${USER}
Install Docker and add the ubuntu user to the docker group.
You may need to exit and re-enter the shell.
Getting Started
cd mkdir build_compute_to_data cd build_compute_to_data cmake ../irods_training/advanced/hpc_compute_to_data make package sudo dpkg -i irods-hpc-compute-to-data-example.deb cd mkdir build_register_microservice cd build_register_microservice cmake ../irods_training/advanced/hpc_compute_to_data/msvc__msiregister_as_admin/ make package sudo dpkg -i irods-microservice-register_as_admin*.deb
Install packages for the compute-to-data example
docker pull irods/irods-training-jupyter-digital-filter
Pull Docker image for processing

Getting Started - Python extensions to iRODS
sudo apt install -y python3-pip
Also, install python's pip package:
Make sure the Python rule engine plugin is installed.
sudo apt install -y irods-rule-engine-plugin-python
sudo mv /usr/lib/python3.12/EXTERNALLY-MANAGED \ /usr/lib/python3.12/EXTERNALLY-MANAGED.moved python3 -m pip install docker==7.0.0
Configure system to allow system-wide pip installations and install the Python Docker API.
Add irods user to the docker group:

sudo usermod -aG docker irods
Add irods user to the docker group:
sudo usermod -aG docker irods
Restart the iRODS server:
sudo su - irods -c 'kill $(cat /var/run/irods/irods-server.pid)' sudo su - irods -c 'irodsServer -d'
Getting Started - Python extensions to iRODS
sudo apt install -y python3-pip
Also, install python's pip package:
Make sure the Python rule engine plugin is installed.
sudo apt install -y irods-rule-engine-plugin-python
sudo su - irods -c "python3 -m pip install docker==7.1.0"
Configure system to allow system-wide pip installations and install the Python Docker API.
Add irods user to the docker group:

sudo usermod -aG docker irods
Add irods user to the docker group:
sudo usermod -aG docker irods
Restart the iRODS server:
sudo su - irods -c 'kill $(cat /var/run/irods/irods-server.pid)' sudo su - irods -c 'irodsServer -d'
Further Setup and Configuration
Place Python Rule Engine stanza after irods_rule_engine_plugin stanza:
As the irods user, create /etc/irods/core.py with the following import:

from compute_to_data import *
"rule_engines": [
{
"instance_name": "irods_rule_engine_plugin-irods_rule_language-instance",
"plugin_name": "irods_rule_engine_plugin-irods_rule_language",
...
"shared_memory_instance": "irods_rule_language_rule_engine"
},
{
"instance_name": "irods_rule_engine_plugin-python-instance",
"plugin_name": "irods_rule_engine_plugin-python",
"plugin_specific_configuration": {}
},
...
Continued... Data-to-Compute Set-up / Configuration
iadmin mkuser alice rodsuser iadmin moduser alice password apass
As the irods user:

This next demonstration will be run as the newly created rodsuser 'alice'
Configure the Tagged Resources - if necessary
Make a unixfilesystem resource for the SIGNAL_PROCESSING role.
iadmin mkresc dsp_resc unixfilesystem $(hostname -f):/tmp/irods/dsp_resc
Annotate it with appropriate metadata given its role
- defined in the configuration as part of the contract
As the irods service account
imeta add -R dsp_resc COMPUTE_RESOURCE_ROLE SIGNAL_PROCESSING

Finally ...
$ iinit ERROR: environment_properties::capture: missing environment file. should be at [/home/ubuntu/.irods/irods_environment.json] Enter the host name (DNS) of the server to connect to: localhost Enter the port number [1247]: 1247 Enter your irods user name: alice Enter your irods zone: tempZone Connecting as alice#tempZone to localhost:1247 ... Enter your current iRODS password:
$ ils
/tempZone/home/alice:
Authenticate as iRODS user "alice" with the ubuntu Linux user.

kill -HUP $(cat /var/run/irods/irods-server.pid)
As the irods user, reload the iRODS server configuration.
The configuration interface
Define interfaces for any necessary conventions
- Metadata attributes and values
- Metadata values for implemented roles
Single Point of Truth - Template Method Pattern
- Execute defined preconditions
- Run user's requested container
Users may utilize metadata conventions within a rule to provide inputs to the generalized container service.

Reminder ...
Implemented as an iRODS rulebase -
following the Template Method pattern
- If necessary, replicate input data to an appropriate resource
- Check permissions
- Launch compute container (Docker) :
- Process input data via Jupyter notebook
- Save results
- Register the resultant directory into iRODS
- Apply metadata to newly registered results (but not today...)

Separation of Concerns
iRODS Rule -> Python Rule -> Docker Container -> Jupyter Notebook

The iRODS Rule Language Rule File
main { container_dispatch("containers.run","/tempZone/home/alice/task_config.json","dsp_resc","","") } INPUT null OUTPUT ruleExecOut
Note - add a delay() directive for asynchronous behavior.
Contents of /home/ubuntu/spawn_remote_containers.r

The Python Rulebase
Located at:
/home/ubuntu/irods_training/advanced/hpc_compute_to_data/compute_to_data.py

The rulebase:
- performs pre-flight checks on the input data
- launches the Docker container (Jupyter notebook)
- registers the results in iRODS
The Digital Signal Processing container
FROM jupyter/base-notebook USER root COPY lpfilter.ipynb /home/jovyan/work/. COPY mymodule/ /home/jovyan/work/mymodule/ USER jovyan RUN conda init RUN conda install -y -c conda-forge matplotlib numpy RUN jupyter trust /home/jovyan/work/lpfilter.ipynb RUN mkfifo /tmp/fifo CMD cat /tmp/fifo

Dockerfile can be found at:
~/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/Dockerfile
The Jupyter Notebook
Located at:
/home/ubuntu/irods_training/advanced/hpc_compute_to_data/jupyter_notebook/lpfilter.ipynb
The notebook:
- loads an input waveform
- applies a digital lowpass filter
- plots 3 graphs of the results
- saves graphs and filtered data

Compute to Data - Digital Filter Testing
ubuntu $ icd ; imkdir notebook_input notebook_output ubuntu $ cd ; iput task_config.json ubuntu $ for x in {1..512}; do echo $((x%24)) ; done >input.dat ubuntu $ iput input.dat notebook_input ubuntu $ ils -lr /tempZone/home/alice: alice 0 demoResc 1665 2025-05-21.20:41 & task_config.json C- /tempZone/home/alice/notebook_input /tempZone/home/alice/notebook_input: alice 0 demoResc 1318 2025-05-21.20:41 & input.dat C- /tempZone/home/alice/notebook_output /tempZone/home/alice/notebook_output: ubuntu $ irule -r irods_rule_engine_plugin-irods_rule_language-instance -F spawn_remote_containers.r ubuntu $ ils -lr /tempZone/home/alice: alice 0 demoResc 853 2025-05-21.20:41 & task_config.json C- /tempZone/home/alice/notebook_input /tempZone/home/alice/notebook_input: alice 0 demoResc 1318 2025-05-21.20:41 & input.dat alice 1 dsp_resc 1318 2025-05-21.20:42 & input.dat C- /tempZone/home/alice/notebook_output /tempZone/home/alice/notebook_output: alice 0 dsp_resc 0 2025-05-21.20:42 & .fe6cc526-063a-11ee-912b-377adb02dba0 C- /tempZone/home/alice/notebook_output/fe6cc526-063a-11ee-912b-377adb02dba0 /tempZone/home/alice/notebook_output/fe6cc526-063a-11ee-912b-377adb02dba0: alice 0 dsp_resc 0 2025-05-21.20:42 & .fe6cc526-063a-11ee-912b-377adb02dba0 alice 0 dsp_resc 746805 2025-05-21.20:42 & lowpass_filter_processing.html alice 0 dsp_resc 3200 2025-05-21.20:42 & lowpass_filtered_input.dat

Compute to Data - Digital Filter Results
sudo su - irods cd /tmp/irods/dsp_resc/home/alice/notebook_output python3 -m http.server 8888

Open in web browser
View the html file
Start a simple http server to view the output

Questions?

UGM 2025 - Compute to Data
By iRODS Consortium
UGM 2025 - Compute to Data
iRODS User Group Meeting 2025 - Advanced Training Module
- 148