iRODS S3 API
Alan King
Senior Software Developer
iRODS Consortium
November 17-22, 2024
Supercomputing 2024
Atlanta, GA
Overview
- Motivation / Goals
- History
- Status / Implementation
- Supported features / clients
- Configuration
- Multipart
- Future Work
Protocol Plumbing - Presenting iRODS as other Protocols
- WebDAV
- FUSE
- HTTP
- NFS
- SFTP
- K8s CSI
- S3
Over the last few years, the ecosystem around the iRODS server has continued to expand.
Integration with other types of systems is a valuable way to increase accessibility without teaching existing tools about the iRODS protocol or introducing new tools to users.
With some plumbing, existing tools get the benefit of visibility into an iRODS deployment.
S3 API - Motivation / Goals
- Present iRODS as the S3 protocol
- Multi-user / Multi-bucket
- Load Balancer friendly
- Maintainable
History
- iRODS S3 Working Group formed in Q3 2021
- 2023 - Violet White (intern) implemented many of the endpoints
- v0.1.0 released Nov. 2023
- v0.2.0 released March 2024
- v0.3.0 released October 2024
Status / Implementation - Architecture
- Single binary
- Single configuration file
- Multi-user
- Multi-bucket
- Requires rodsadmin credentials
- Tests passing with:
- AWS CLI Client
- Boto3 Python Library
- MinIO Python Client
- MinIO CLI Client
Status / Implementation - Endpoints
- Investigating
- GetObjectAcl
- ListObjects(V1)
- ListMultipartUploads
- PutObjectAcl
- PutObjectTagging
- UploadPartCopy
- AbortMultipartUpload
- CopyObject
- CompleteMultipartUpload
- CreateMultipartUpload
- DeleteObject
- DeleteObjects
- GetBucketLocation
- GetObject
- GetObjectLockConfiguration (stub)
- GetObjectTagging (stub)
- HeadBucket
- HeadObject
- ListBuckets
- ListObjectsV2
- PutObject
- UploadPart
Implementation - Configuration
Single file which defines two sections to help administrators understand the options and how they relate to each other.
Modeled after NFSRODS.
{
// Defines S3 options that affect how the
// client-facing component of the server behaves.
"s3_server": {
// ...
},
// Defines iRODS connection information.
"irods_client": {
// ...
}
}
Implementation - Configuration
"s3_server": {
"host": "0.0.0.0",
"port": 9000,
"log_level": "info",
"plugins": {
"static_bucket_resolver": {
"name": "static_bucket_resolver",
"mappings": {
"<bucket_name>": "/path/to/collection",
"<another_bucket>": "/path/to/another/collection"
}
},
"static_authentication_resolver": {
"name": "static_authentication_resolver",
"users": {
"<s3_username>": {
"username": "<string>",
"secret_key": "<string>"
}
}
}
},
"region": "us-east-1",
"multipart_upload_part_files_directory": "/tmp",
"authentication": {
"eviction_check_interval_in_seconds": 60,
"basic": { "timeout_in_seconds": 3600 }
},
"requests": {
"threads": 3,
"max_size_of_request_body_in_bytes": 8388608,
"timeout_in_seconds": 30
},
"background_io": { "threads": 6 }
}
Implementation - Configuration
"irods_client": {
"host": "<string>",
"port": 1247,
"zone": "<string>",
"tls": { /* ... options ... */ },
"enable_4_2_compatibility": false,
"proxy_admin_account": {
"username": "<string>",
"password": "<string>"
},
"connection_pool": {
"size": 6,
"refresh_timeout_in_seconds": 600,
"max_retrievals_before_refresh": 16,
"refresh_when_resource_changes_detected": true
},
"resource": "<string>",
"put_object_buffer_size_in_bytes": 8192,
"get_object_buffer_size_in_bytes": 8192
}
Implementation - Multipart Implementations Considered
A. Multiobject - Parts written as separate objects. On CompleteMultipartUpload, parts are concatenated on the iRODS server.
- Efficient
- Unintentional execution of policy for each part
- Pollutes iRODS namespace
- Would require a concatenate API plugin
B. Store-and-Forward - Write each part to the mid-tier, then forward to iRODS on CompleteMultipartUpload.
- No extra policy triggered
- Requires a large amount of scratch space in the mid-tier
- Non-trivial CompleteMultipartUpload
C. Efficient Store-and-Forward - Write down / hold non-contiguous parts in the mid-tier, then send contiguous parts to iRODS when ready.
- Complicated - parts are not necessarily sent in order and can be resent
- Do not know part offsets so could only forward when all previous parts have been written
- Worst case almost the entire object would still need to be stored in the mid-tier
D. Store-and-Register - Write to a file accessible to iRODS and register when complete.
- Still requires writing individual part files since we do not know the part offsets
- Requires shared visibility between iRODS and S3 API
Implementation - B. Store-and-Forward (v0.2.0)
How does it work?
1. CreateMultipartUpload
- Generate upload_id (UUID) and return the upload_id in the response
2. UploadPart
- Write bytes to a local file (location determined by configuration)
Implementation - B. Store-and-Forward (v0.2.0)
3. CompleteMultipartUpload
- Reminder: In pure S3, CompleteMultipartUpload is trivial
- Create the object in iRODS
- Determine the offset for each part
- Iterate through the parts and create background I/O tasks to write parts to the iRODS object
- When all parts are written, remove part files and send response to the client
Status / Implementation - v0.2.0 Performance Comparison
The following compares transfers to/from iRODS via the S3 API with transfers to/from a local MinIO server. The Boto S3 client was used for all cases.
Notes:
- The tests consisted of transfers of files from 200 MB to 1800 MB.
- The median of five runs is reported for each file size.
- Multipart uploads require two read/write cycles with store-and-forward.
- The S3 API was configured with 30 threads handling requests and 30 background threads.
- Performance degraded with large files when there was an insufficient number of background threads.
Status / Implementation - C. Efficient Store-and-Forward (v0.3.0)
- Reminder: S3 protocol does not specify that parts need to be sent in order, nor be uniform in size
- Improvement: Track a map of part numbers to sizes for active upload IDs
- UploadPart can stream directly to iRODS object if we know all the preceding part sizes (offset can be calculated)
- CompleteMultipartUpload then only has to stream data for parts which did not stream directly to the iRODS object via UploadPart
- This improves the original implementation by reducing the number of intermediate part files (worst-case: all parts have part files)
- ~30% performance improvement for uploads versus v0.2.0
- Caveat: Parts should never be re-sent with a size different from what is in the part size map
Status / Implementation - Multipart Performance Improvement
- Average 27% improvement over original implementation
- Default configurations used for S3 clients and S3 API server
- Results may vary depending on the ordered-ness of parts being sent
- Worst-case performance: All parts had intermediate part file
- Best-case performance: All parts sent in order
Future Work - D. Store and Register
- Consider: Store-and-Forward transmits every part twice
- Improvement: Reduce to once
- Write part files to storage visible to the iRODS server
- Concatenate into a single file
- Register the combined file in the iRODS catalog
- Challenges:
- iRODS policy would only execute for registration
- CompleteMultipartUpload still has to wait until the part files are combined before registering
- Multipart upload "mode" could become a configuration option: store-and-forward or store-and-register
Future Work
- Additional improvements for multipart
- Use SQLite for tracking upload information
- Implement the other approaches
- Optimize multipart downloads
- Additional endpoints
- Tagging
- ACLs
- Dynamic bucket mappings
- Dynamic user mappings
Thank you!
SC24 - iRODS S3 API (Booth Edition)
By Alan King
SC24 - iRODS S3 API (Booth Edition)
- 6