Kory Draughn
Chief Technologist
iRODS Consortium
November 12-17, 2023
Supercomputing 2023
Denver, CO
iRODS HTTP API:
Presenting iRODS as HTTP
Overview
What is the iRODS HTTP API?
An experimental redesign of the iRODS C++ REST API.
Goals of the project ...
Why is this necessary?
The iRODS C++ REST API proves that presenting iRODS as HTTP is possible, however, usage of the project over time has uncovered some challenges.
Challenges ...
The iRODS HTTP API is aimed at resolving these issues by taking a different approach based on what we've learned from the community and the iRODS S3 API.
To view the original document which kick-started this effort, click here.
Design - Early Decisions
Design - API URLs
Named based on concepts and entities defined in iRODS.
Operations are specified via parameters. This decision keeps URLs simple (i.e. no nesting required) and allows new/existing developers to guess which URL exposes the behavior they are interested in.
For example, if you want to modify a user, look at /users-groups. Or, perhaps you need to write data to a data object, then you'd use /data-objects.
/authenticate | /resources |
/collections | /rules |
/data-objects | /tickets |
/info | /users-groups |
/query | /zones |
Design - API Parameters
All URLs, except /authenticate and /info, accept an op parameter.
Common parameters used through out the API ...
Parameter names are not final and may change in the future.
Configuration - Top Level
{
// Defines HTTP options that affect how the
// client-facing component of the server behaves.
"http_server": {
// ...
},
// Defines iRODS connection information.
"irods_client": {
// ...
}
}
Defines two sections to help administrators understand the options and how they relate to each other.
Modeled after NFSRODS.
Configuration - http_server
"http_server": {
"host": "0.0.0.0",
"port": 9000,
"log_level": "warn",
"authentication": {
"eviction_check_interval_in_seconds": 60,
"basic": {
"timeout_in_seconds": 3600
},
"oidc": { /* ... options ... */ }
},
"requests": {
"threads": 3,
"max_rbuffer_size_in_bytes": 8388608,
"timeout_in_seconds": 30
},
"background_io": {
"threads": 3
}
}
Configuration - irods_client
"irods_client": {
"host": "<string>",
"port": 1247,
"zone": "<string>",
"proxy_admin_account": {
"username": "<string>",
"password": "<string>"
},
"tls": { /* ... options ... */ },
"enable_4_2_compatibility": false,
"connection_pool": {
"size": 6,
"refresh_when_resource_changes_detected": true
},
"max_number_of_bytes_per_read_operation": 8192,
"buffer_size_in_bytes_for_write_operations": 8192,
"max_number_of_rows_per_catalog_query": 15
}
Connection Pooling
iRODS clients connect and disconnect frequently.
This kills performance!
This issue resulted in the following enhancements for iRODS 4.3.1 ...
With these facilities, the iRODS HTTP API can reuse existing iRODS connections to significantly boost performance.
Parallel Writes
iRODS does not allow a data object to be written to in parallel without coordination.
Clients wanting to upload data in parallel are required to do the following ...
Parallel Writes
Fully supported through the use of a Parallel Write Handle.
This ultimately means, the iRODS HTTP API server maintains state on behalf of the client.
Performing a Parallel Write requires the use of two operations ...
Large files must use multipart/form-data as the content type. Failing to honor this rule will result in an error or corrupt data.
Parallel Writes - Example
http_api_url="http://localhost:9000/irods-http-api/0.1.0/data-objects"
# Open 3 streams to the data object, file.bin.
transfer_handle=$(curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
--data-urlencode 'op=parallel_write_init' \
--data-urlencode "lpath=/tempZone/home/rods/file.bin" \
--data-urlencode 'stream-count=3' \
| jq -r .parallel_write_handle)
# Write "hello" (i.e. 5 bytes) to the data object.
# Notice we didn't specify which stream to use.
curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
-F 'op=write' \
-F "parallel-write-handle=$transfer_handle" \
-F 'count=5' \
-F 'bytes=hello;type=application/octet-stream' \
| jq
# Shutdown all streams and update the catalog.
curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
--data-urlencode 'op=parallel_write_shutdown' \
--data-urlencode "parallel-write-handle=$transfer_handle" \
| jq
Demonstrates how to open 3 streams to a data object and write 5 bytes to it.
Parallel Writes - Java application vs iput
Client | Time Elapsed |
---|---|
iput (uses high ports) | 50.113s |
Java application | 51.975s |
Performance is sensitive to buffer sizes and number of threads used.
General Performance - Test Environment and Setup
General Performance - Test Results
Examples - Stat'ing a collection
base_url="http://localhost:9000/irods-http-api/0.1.0"
bearer_token=$(curl -sX POST --user 'rods:rods' "$base_url/authenticate")
curl -sG -H "Authorization: Bearer $bearer_token" \
"$base_url/collections" \
--data-urlencode 'op=stat' \
--data-urlencode 'lpath=/tempZone/home/rods' \
| jq
{
"inheritance_enabled": false,
"irods_response": {
"status_code": 0
},
"modified_at": 1686499669,
"permissions": [
{
"name": "rods",
"perm": "own",
"type": "rodsadmin",
"zone": "tempZone"
}
],
"registered": true,
"type": "collection"
}
Examples - Listing available Rule Engine Plugins
base_url="http://localhost:9000/irods-http-api/0.1.0"
bearer_token=$(curl -sX POST --user 'rods:rods' "$base_url/authenticate")
curl -sG -H "Authorization: Bearer $bearer_token" \
"$base_url/rules" \
--data-urlencode 'op=list_rule_engines' \
| jq
{
"irods_response": {
"status_code": 0
},
"rule_engine_plugin_instances": [
"irods_rule_engine_plugin-irods_rule_language-instance",
"irods_rule_engine_plugin-cpp_default_policy-instance"
]
}
Future Work
v0.1.0 is available today!
https://irods.org/2023/11/initial-release-of-the-irods-http-api
Help us make this project better for everyone.
Thank you!
Questions?