iRODS S3 API

Alan King

Senior Software Developer

iRODS Consortium

November 17-22, 2024

Supercomputing 2024

Atlanta, GA

Overview

  • Motivation / Goals
  • History
  • Status / Implementation
    • Supported features / clients
    • Configuration
    • Multipart
  • Future Work

Motivation / Goals

  • Present iRODS as the S3 protocol
  • Multi-user / Multi-bucket
  • Load Balancer friendly
  • Maintainable

History

Status / Implementation - Architecture

  • Single binary
  • Single configuration file
  • Multi-user
  • Multi-bucket
  • Requires rodsadmin credentials
  • Tests passing with:
    • AWS CLI Client
    • Boto3 Python Library
    • MinIO Python Client
    • MinIO CLI Client

Status / Implementation - Endpoints

  • Investigating
    • GetObjectAcl
    • ListObjects(V1)
    • ListMultipartUploads
    • PutObjectAcl
    • PutObjectTagging
    • UploadPartCopy
  • AbortMultipartUpload
  • CopyObject
  • CompleteMultipartUpload
  • CreateMultipartUpload
  • DeleteObject
  • DeleteObjects
  • GetBucketLocation
  • GetObject
  • GetObjectLockConfiguration (stub)
  • GetObjectTagging (stub)
  • HeadBucket
  • HeadObject
  • ListBuckets
  • ListObjectsV2
  • PutObject
  • UploadPart

Implementation - Configuration

Single file which defines two sections to help administrators understand the options and how they relate to each other.

 

Modeled after NFSRODS.

{
    // Defines S3 options that affect how the
    // client-facing component of the server behaves.
    "s3_server": {
        // ...
    },

    // Defines iRODS connection information.
    "irods_client": {
        // ...
    }
}

Implementation - Configuration

"s3_server": {
    "host": "0.0.0.0",
    "port": 9000,
    "log_level": "info",
    "plugins": {
        "static_bucket_resolver": {
            "name": "static_bucket_resolver",
            "mappings": {
                "<bucket_name>": "/path/to/collection",
                "<another_bucket>": "/path/to/another/collection"
            }
        },
        "static_authentication_resolver": {
            "name": "static_authentication_resolver",
            "users": {
                "<s3_username>": {
                    "username": "<string>",
                    "secret_key": "<string>"
                }
            }
        }
    },
    "region": "us-east-1",
    "multipart_upload_part_files_directory": "/tmp",
    "authentication": {
        "eviction_check_interval_in_seconds": 60,
        "basic": { "timeout_in_seconds": 3600 }
    },
    "requests": {
        "threads": 3,
        "max_size_of_request_body_in_bytes": 8388608,
        "timeout_in_seconds": 30
    },
    "background_io": { "threads": 6 }
}

Implementation - Configuration

"irods_client": {
    "host": "<string>",
    "port": 1247,
    "zone": "<string>",

    "tls": { /* ... options ... */ },

    "enable_4_2_compatibility": false,

    "proxy_admin_account": {
        "username": "<string>",
        "password": "<string>"
    },
 
    "connection_pool": {
        "size": 6,
        "refresh_timeout_in_seconds": 600,
        "max_retrievals_before_refresh": 16,
        "refresh_when_resource_changes_detected": true
    },

    "resource": "<string>",
    "put_object_buffer_size_in_bytes": 8192,
    "get_object_buffer_size_in_bytes": 8192
}

Implementation - Multipart Implementations Considered

A.  Multiobject - Parts written as separate objects.  On CompleteMultipartUpload, parts are concatenated on the iRODS server.

  • Efficient
  • Unintentional execution of policy for each part
  • Pollutes iRODS namespace
  • Would require a concatenate API plugin
     

B.  Store-and-Forward - Write each part to the mid-tier, then forward to iRODS on CompleteMultipartUpload.

  • No extra policy triggered
  • Requires a large amount of scratch space in the mid-tier
  • Non-trivial CompleteMultipartUpload
     

C.  Efficient Store-and-Forward - Write down / hold non-contiguous parts in the mid-tier, then send contiguous parts to iRODS when ready.

  • Complicated - parts are not necessarily sent in order and can be resent
  • Do not know part offsets so could only forward when all previous parts have been written
  • Worst case almost the entire object would still need to be stored in the mid-tier
     

D.  Store-and-Register - Write to a file accessible to iRODS and register when complete.

  • Still requires writing individual part files since we do not know the part offsets
  • Requires shared visibility between iRODS and S3 API

Implementation - B. Store-and-Forward (v0.2.0)

How does it work?
 

1. CreateMultipartUpload

  • Generate upload_id (UUID) and return the upload_id in the response

2. UploadPart​

  • Write bytes to a local file (location determined by configuration)

 

Implementation - B. Store-and-Forward (v0.2.0)

3. CompleteMultipartUpload

  • Reminder: In pure S3, CompleteMultipartUpload is trivial
  • Create the object in iRODS
  • Determine the offset for each part
  • Iterate through the parts and create background I/O tasks to write parts to the iRODS object
  • When all parts are written, remove part files and send response to the client

Status / Implementation - v0.2.0 Performance Comparison

The following compares transfers to/from iRODS via the S3 API with transfers to/from a local MinIO server.  The Boto S3 client was used for all cases.

Notes:

  • The tests consisted of transfers of files from 200 MB to 1800 MB.
  • The median of five runs is reported for each file size.
  • Multipart uploads require two read/write cycles with store-and-forward.
  • The S3 API was configured with 30 threads handling requests and 30 background threads.
  • Performance degraded with large files when there was an insufficient number of background threads.

Status / Implementation - C. Efficient Store-and-Forward (v0.3.0)

  • Reminder: S3 protocol does not specify that parts need to be sent in order, nor be uniform in size
  • Improvement: Track a map of part numbers to sizes for active upload IDs
    • UploadPart can stream directly to iRODS object if we know all the preceding part sizes (offset can be calculated)
    • CompleteMultipartUpload then only has to stream data for parts which did not stream directly to the iRODS object via UploadPart
  • This improves the original implementation by reducing the number of intermediate part files (worst-case: all parts have part files)
  • ~30% performance improvement for uploads versus v0.2.0
  • Caveat: Parts should never be re-sent with a size different from what is in the part size map

Status / Implementation - Multipart Performance Improvement

  • Average 27% improvement over original implementation
  • Default configurations used for S3 clients and S3 API server
  • Results may vary depending on the ordered-ness of parts being sent
    • Worst-case performance: All parts had intermediate part file
    • Best-case performance: All parts sent in order

Future Work - D. Store and Register

  • Consider: Store-and-Forward transmits every part twice
  • Improvement: Reduce to once
    • Write part files to storage visible to the iRODS server
    • Concatenate into a single file
    • Register the combined file in the iRODS catalog
  • Challenges:
    • iRODS policy would only execute for registration
    • CompleteMultipartUpload still has to wait until the part files are combined before registering
  • Multipart upload "mode" could become a configuration option: store-and-forward or store-and-register

Future Work

  • Additional improvements for multipart
    • Use SQLite for tracking upload information
    • Implement the other approaches
    • Optimize multipart downloads
  • Additional endpoints
    • Tagging
    • ACLs
  • Dynamic bucket mappings
  • Dynamic user mappings

Thank you!