Kory Draughn
Chief Technologist
iRODS Consortium
iRODS HTTP API
June 13-16, 2023
iRODS User Group Meeting 2023
Chapel Hill, NC
Overview
What is the iRODS HTTP API?
An experimental redesign of the iRODS C++ REST API.
Goals of the project ...
Why is this necessary?
The iRODS C++ REST API proves that presenting iRODS as HTTP is possible, however, usage of the project over time has uncovered some challenges.
Challenges ...
The iRODS HTTP API is aimed at resolving these issues by taking a different approach based on what we've learned from the community and the iRODS S3 API.
To view the original document which kick-started this effort, click here.
Design - Early Decisions
Design - API URLs
Named based on concepts and entities defined in iRODS.
Operations are specified via parameters. This decision keeps URLs simple (i.e. no nesting required) and allows new/existing developers to guess which URL exposes the behavior they are interested in.
For example, if you want to modify a user, look at /users-groups. Or, perhaps you need to write data to a data object, then you'd use /data-objects.
/authenticate | /info | /resources | /users-groups |
/collections | /metadata | /rules | /zones |
/data-objects | /query | /tickets |
Design - API Parameters
All URLs, except /authenticate, accept an op parameter.
Common parameters used through out the API ...
Parameter names are not final and may change in the future.
Configuration - Top Level
{
// Defines HTTP options that affect how the
// client-facing component of the server behaves.
"http_server": {
// ...
},
// Defines iRODS connection information.
"irods_client": {
// ...
}
}
Defines two sections to help administrators understand the options and how they relate to each other.
Modeled after NFSRODS.
Configuration - http_server
"http_server": {
"host": "0.0.0.0",
"port": 9000,
"log_level": "warn",
"authentication": {
"basic": {
"timeout_in_seconds": 3600
}
},
"requests": {
"threads": 3,
"max_rbuffer_size_in_bytes": 8388608,
"timeout_in_seconds": 30
},
"background_io": {
"threads": 3
}
}
Configuration - irods_client
"irods_client": {
"host": "<string>",
"port": 1247,
"zone": "<zone>",
"proxy_rodsadmin": {
"username": "<string>",
"password": "<string>"
},
"connection_pool": {
"size": 6,
"refresh_timeout_in_seconds": 600
},
"max_rbuffer_size_in_bytes": 8192,
"max_wbuffer_size_in_bytes": 8192,
"max_number_of_rows_per_catalog_query": 15
}
Connection Pooling
iRODS clients connect and disconnect frequently.
This kills performance!
This issue resulted in the following enhancements for iRODS 4.3.1 ...
With these facilities, the iRODS HTTP API can reuse existing iRODS connections to significantly boost performance.
Connection Pooling - Implementation
// TODO May require the zone name be passed as well for federation?
auto get_connection(const std::string& _username)
-> irods::connection_pool::connection_proxy
{
namespace log = irods::http::log;
auto& cp = irods::http::globals::conn_pool;
auto conn = cp->get_connection();
const auto& zone = irods::http::globals::config->at("irods_client")
.at("zone").get_ref<const std::string&>();
log::trace("{}: Changing identity associated with connection to [{}].",
__func__, _username);
auto* conn_ptr = static_cast<RcComm*>(conn);
const auto ec = rc_switch_user(conn_ptr, _username.c_str(), zone.c_str());
if (ec != 0) {
log::error("{}: rc_switch_user error: {}", __func__, ec);
THROW(SYS_INTERNAL_ERR, "rc_switch_user error.");
}
log::trace("{}: Successfully changed identity associated with connection to [{}].",
__func__, _username);
return conn;
} // get_connection
Parallel Writes
iRODS does not allow a data object to be written to in parallel without coordination.
Clients wanting to upload data in parallel are required to do the following ...
Parallel Writes
Fully supported through the use of a Parallel Write Handle.
This ultimately means, the iRODS HTTP API server maintains state on behalf of the client.
Performing a Parallel Write requires the use of two operations ...
Large files must use multipart/form-data as the content type. Failing to honor this rule will result in an error or corrupt data.
Parallel Writes - Example
http_api_url="${base_url}/data-objects"
# Open 3 streams to the data object, file.bin.
transfer_handle=$(curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
--data-urlencode 'op=parallel_write_init' \
--data-urlencode "lpath=/tempZone/home/rods/file.bin" \
--data-urlencode 'stream-count=3' \
| jq -r .parallel_write_handle)
# Write "hello" (i.e. 5 bytes) to the data object.
# Notice we didn't specify which stream to use.
curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
-F 'op=write' \
-F "parallel-write-handle=$transfer_handle" \
-F 'count=5' \
-F 'bytes=hello;type=application/octet-stream' \
| jq
# Shutdown all streams and update the catalog.
curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
--data-urlencode 'op=parallel_write_shutdown' \
--data-urlencode "parallel-write-handle=$transfer_handle" \
| jq
Demonstrates how to open 3 streams to a data object and write 5 bytes to it.
Parallel Writes - Java application vs iput
Client | Time Elapsed |
---|---|
iput (uses high ports) | 50.113s |
Java application | 51.975s |
Performance is sensitive to buffer sizes and number of threads used.
General Performance - Test Environment and Setup
General Performance - Test Results
Examples - Stat'ing a collection
base_url="http://localhost:9000/irods-http/0.9.5"
bearer_token=$(curl -sX POST --user 'rods:rods' "${base_url}/authenticate")
curl -sG -H "Authorization: Bearer $bearer_token" \
"${base_url}/collections" \
--data-urlencode 'op=stat' \
--data-urlencode 'lpath=/tempZone/home/rods' \
| jq
{
"inheritance_enabled": false,
"irods_response": {
"error_code": 0
},
"modified_at": 1686499669,
"permissions": [
{
"name": "rods",
"perm": "own",
"type": "rodsadmin",
"zone": "tempZone"
}
],
"registered": true,
"type": "collection"
}
Examples - Listing available Rule Engine Plugins
base_url="http://localhost:9000/irods-http/0.9.5"
bearer_token=$(curl -sX POST --user 'rods:rods' "${base_url}/authenticate")
curl -sG -H "Authorization: Bearer $bearer_token" \
"${base_url}/rules" \
--data-urlencode 'op=list_rule_engines' \
| jq
{
"irods_response": {
"error_code": 0
},
"rule_engine_plugin_instances": [
"irods_rule_engine_plugin-irods_rule_language-instance",
"irods_rule_engine_plugin-cpp_default_policy-instance"
]
}
Remaining Work
Future Plans
Thank you!
Questions?