Kory Draughn
Chief Technologist
iRODS Consortium
November 12-17, 2023
Supercomputing 2023
Denver, CO
iRODS HTTP API:
Presenting iRODS as HTTP
Overview
- What is the iRODS HTTP API?
- Why is this necessary?
- Design
- Configuration
- Connection Pooling
- Parallel Writes
- General Performance
- Examples
- Future Work
What is the iRODS HTTP API?
An experimental redesign of the iRODS C++ REST API.
Goals of the project ...
- Present a cohesive representation of the iRODS API over the HTTP protocol, effectively simplifying development of client-side iRODS applications for new developers
- Maintain performance close to the iCommands
- Remove behavioral differences between client-side iRODS libraries by building new libraries on top of the HTTP API
- C, C++, Java, Python, etc - all languages produce identical behavior and results
- Absorbed by the iRODS server if adoption is significant
Why is this necessary?
The iRODS C++ REST API proves that presenting iRODS as HTTP is possible, however, usage of the project over time has uncovered some challenges.
Challenges ...
- Too many open ports raise security concerns
- Stability issues (e.g. crashing endpoints)
- Separation of endpoints increases complexity due to multiple layers
- e.g. Interns found it difficult to understand how things are composed
- Pistache HTTP library lacks completeness/maturity/adoption
- Names of existing endpoints are fairly general which leads to difficulty in naming of new endpoints
The iRODS HTTP API is aimed at resolving these issues by taking a different approach based on what we've learned from the community and the iRODS S3 API.
To view the original document which kick-started this effort, click here.
Design - Early Decisions
- Single binary exposing one (or two) ports
- Boost.Beast
- A C++ header-only library providing networking facilities for building high performance libraries and applications which need support for HTTP/1 and Websockets
- First used by the iRODS S3 API
- Fixed set of URLs
- Easy for users and developers to remember
- Renamed from REST to HTTP
- The rules of REST are not clear
- The rules of REST do not map well to the iRODS API
- iRODS is stateful
- Focus on designing the best API we can
Design - API URLs
Named based on concepts and entities defined in iRODS.
Operations are specified via parameters. This decision keeps URLs simple (i.e. no nesting required) and allows new/existing developers to guess which URL exposes the behavior they are interested in.
For example, if you want to modify a user, look at /users-groups. Or, perhaps you need to write data to a data object, then you'd use /data-objects.
/authenticate | /resources |
/collections | /rules |
/data-objects | /tickets |
/info | /users-groups |
/query | /zones |
Design - API Parameters
All URLs, except /authenticate and /info, accept an op parameter.
- Mapped to a function responsible for executing the requested operation
- Shares common values where possible
- e.g. stat, list, create, remove, etc
Common parameters used through out the API ...
- lpath
- replica-number
- src-resource
- dst-resource
- offset
- count
Parameter names are not final and may change in the future.
Configuration - Top Level
{
// Defines HTTP options that affect how the
// client-facing component of the server behaves.
"http_server": {
// ...
},
// Defines iRODS connection information.
"irods_client": {
// ...
}
}
Defines two sections to help administrators understand the options and how they relate to each other.
Modeled after NFSRODS.
Configuration - http_server
"http_server": {
"host": "0.0.0.0",
"port": 9000,
"log_level": "warn",
"authentication": {
"eviction_check_interval_in_seconds": 60,
"basic": {
"timeout_in_seconds": 3600
},
"oidc": { /* ... options ... */ }
},
"requests": {
"threads": 3,
"max_rbuffer_size_in_bytes": 8388608,
"timeout_in_seconds": 30
},
"background_io": {
"threads": 3
}
}
Configuration - irods_client
"irods_client": {
"host": "<string>",
"port": 1247,
"zone": "<string>",
"proxy_admin_account": {
"username": "<string>",
"password": "<string>"
},
"tls": { /* ... options ... */ },
"enable_4_2_compatibility": false,
"connection_pool": {
"size": 6,
"refresh_when_resource_changes_detected": true
},
"max_number_of_bytes_per_read_operation": 8192,
"buffer_size_in_bytes_for_write_operations": 8192,
"max_number_of_rows_per_catalog_query": 15
}
Connection Pooling
iRODS clients connect and disconnect frequently.
This kills performance!
This issue resulted in the following enhancements for iRODS 4.3.1 ...
- Proxy user support for irods::connection_pool and irods::client_connection
- rc_check_auth_credentials
- Allows native authentication credentials to be verified
- rc_switch_user
- Allows the identity associated with an RcComm to be changed in real-time
With these facilities, the iRODS HTTP API can reuse existing iRODS connections to significantly boost performance.
Parallel Writes
iRODS does not allow a data object to be written to in parallel without coordination.
Clients wanting to upload data in parallel are required to do the following ...
- Open a stream to the replica of interest.
- Capture the Replica Access Token from the stream.
- Open secondary streams.
- Each stream must use its own connection
- Each stream must target the same replica
- Each stream must use the same open flags
- Each stream must pass the Replica Access Token obtained from the stream in step (1)
- Send bytes across streams.
- Close secondary streams without updating the catalog.
- Close the original stream normally.
Parallel Writes
Fully supported through the use of a Parallel Write Handle.
This ultimately means, the iRODS HTTP API server maintains state on behalf of the client.
Performing a Parallel Write requires the use of two operations ...
- parallel_write_init
- Instructs the server to allocate memory for managing the state of the upload
- parallel_write_shutdown
- Instructs the server to deallocate memory obtained via a call to parallel_write_init
Large files must use multipart/form-data as the content type. Failing to honor this rule will result in an error or corrupt data.
Parallel Writes - Example
http_api_url="http://localhost:9000/irods-http-api/0.1.0/data-objects"
# Open 3 streams to the data object, file.bin.
transfer_handle=$(curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
--data-urlencode 'op=parallel_write_init' \
--data-urlencode "lpath=/tempZone/home/rods/file.bin" \
--data-urlencode 'stream-count=3' \
| jq -r .parallel_write_handle)
# Write "hello" (i.e. 5 bytes) to the data object.
# Notice we didn't specify which stream to use.
curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
-F 'op=write' \
-F "parallel-write-handle=$transfer_handle" \
-F 'count=5' \
-F 'bytes=hello;type=application/octet-stream' \
| jq
# Shutdown all streams and update the catalog.
curl -H "Authorization: Bearer $bearer_token" "$http_api_url" \
--data-urlencode 'op=parallel_write_shutdown' \
--data-urlencode "parallel-write-handle=$transfer_handle" \
| jq
Demonstrates how to open 3 streams to a data object and write 5 bytes to it.
Parallel Writes - Java application vs iput
- Testing was carried out using two machines in different locations
- Home network vs Office network
- Custom Java application built on top of the iRODS HTTP API
- Not optimized
- Each application used 4 threads to upload a 100 MiB file into iRODS
Client | Time Elapsed |
---|---|
iput (uses high ports) | 50.113s |
Java application | 51.975s |
Performance is sensitive to buffer sizes and number of threads used.
General Performance - Test Environment and Setup
- Used ApacheBench to measure Requests Per Second (RPS)
- Sent 2000 requests total
- Maintained 500 concurrent requests at all times
- All testing was performed using a single machine
- Development machine has 32 cores with 256 GiB of RAM
- iRODS HTTP API
- Optimizations enabled
- 8 threads for foreground processing
- 56 threads for background processing
- Connection pool containing 56 iRODS connections
General Performance - Test Results
- /authenticate - Authenticating a new user using Basic/Native authentication
- 1389.47 RPS
- 50% of requests took at least 333 ms to serve
- /resources - Stat'ing a resource
- 2702.21 RPS
- 50% of requests took at least 165 ms to serve
- /data-objects - Reading 8192 bytes
- 802.53 RPS
- 50% of requests took at least 585 ms to serve
Examples - Stat'ing a collection
base_url="http://localhost:9000/irods-http-api/0.1.0"
bearer_token=$(curl -sX POST --user 'rods:rods' "$base_url/authenticate")
curl -sG -H "Authorization: Bearer $bearer_token" \
"$base_url/collections" \
--data-urlencode 'op=stat' \
--data-urlencode 'lpath=/tempZone/home/rods' \
| jq
{
"inheritance_enabled": false,
"irods_response": {
"status_code": 0
},
"modified_at": 1686499669,
"permissions": [
{
"name": "rods",
"perm": "own",
"type": "rodsadmin",
"zone": "tempZone"
}
],
"registered": true,
"type": "collection"
}
Examples - Listing available Rule Engine Plugins
base_url="http://localhost:9000/irods-http-api/0.1.0"
bearer_token=$(curl -sX POST --user 'rods:rods' "$base_url/authenticate")
curl -sG -H "Authorization: Bearer $bearer_token" \
"$base_url/rules" \
--data-urlencode 'op=list_rule_engines' \
| jq
{
"irods_response": {
"status_code": 0
},
"rule_engine_plugin_instances": [
"irods_rule_engine_plugin-irods_rule_language-instance",
"irods_rule_engine_plugin-cpp_default_policy-instance"
]
}
Future Work
- Document the API in terms of OpenAPI
- Add support for remaining API endpoints provided by iRODS
- Bulk/Batch operations
- Archive file operations
- Validate the configuration on server startup
- Harden the implementation
- Improve performance
v0.1.0 is available today!
https://irods.org/2023/11/initial-release-of-the-irods-http-api
Help us make this project better for everyone.
Thank you!
Questions?
SC23 - iRODS HTTP API: Presenting iRODS as HTTP
By iRODS Consortium
SC23 - iRODS HTTP API: Presenting iRODS as HTTP
- 371