iRODS S3 API:

Presenting iRODS as S3

Terrell Russell, Ph.D.

@terrellrussell

Executive Director, iRODS Consortium

November 12-17, 2023

Supercomputing 2023

Denver, CO

Overview

  • History
    • Desire / Justification
    • Research
  • Status / Implementation
    • Architecture
    • Configuration
  • Next Steps

History - Desire / Justification

  • Everyone just wants to talk to an S3 endpoint
    • Easy
    • Comprehensible
    • Lots of existing clients
    • Decoupled authentication
  • Aligns with our Protocol Plumbing efforts
    • Accessible / Approachable
    • Maintainable

History - Desire / Justification

  • Present iRODS as the S3 protocol
  • Reuse - don't reinvent
  • Load Balancer friendly
  • Maintainable

History - Desire / Justification

History - Desire / Justification

1. Update and maintain https://github.com/bioteam/minio-irods-gateway

This minio-based front end already exists and has been demonstrated, but has not been updated since its debut. I do not know the extent that it has been used in production work, perhaps John can talk about that. This uses the GoRODS client library, which will continue to need to be updated to wrap the latest iRODS C API as new releases of the server come out. Presumably, BioTeam / John would continue to own/maintain GoRODS and the minio-irods-gateway.

 

2. minio-irods-gateway converts to use https://github.com/cyverse/go-irodsclient

Illyoung has produced and is actively developing a pure Go iRODS client library. This new library could be 'swapped' for the GoRODS calls in the minio-irods-gateway. Presumably, then Arizona / Illyoung would own/maintain a fork of the minio-irods-gateway.

 

3. Add irods/gateway-irods.go to upstream https://github.com/minio/minio/tree/master/cmd/gateway

Someone / all of us would port / implement the same work from Option 2 (convert to pure Go), but work with the minio community to get iRODS to be an officially supported gateway for the MinIO server directly.

 

4. New C++ implementation - https://github.com/irods/irods_client_s3_cpp

The iRODS Consortium would work to implement the S3 specification directly with a new C++ client. This could be more performant (in the long run), but requires the most work and answers to open questions.

 

"I am leaning towards Option 3 as the best option both from a cost/benefit perspective, as well as exposure to a larger community and confidence that 'it just works'."

History - Research

  • Investigated the 4 options, in 4 phases
    • Options 1-3 ... MinIO, Go, TicketBooth
    • Option 4 ... C++ proof of concept
    • Multipart
    • Handshake / Authentication

History - Research - Phase 1

  • Option 1 - MinIO with GoRODS (wrapping C)
    • Limited, not maintainable (Jul 2021)
  • Option 2 - MinIO with pure go-irodsclient
    • Needed to add (anonymous) ticket functionality
      • Implemented TicketBooth/BoxOffice (Oct 2021)
        • But would need admin credentials
        • Might as well just use C++ REST API (Nov 2021)
    • Lacks multi-user functionality (Feb 2022)
      • Auth code is in MinIO core - gateway code fires 'too late'
  • Option 3 - Get work into upstream MinIO
    • MinIO announced deprecation of gateway (May 2022)
      • Too hard / not worth supporting 'legacy' POSIX

History - Research - Phase 2

  • Option 4 - New C++ Implementation
    • Removes dependency on other codebase(s)
    • 1 collection -> 1 bucket
    • Framework selection (Aug 2022)
      • Pistache
      • Oat++
      • Drogon
      • Boost.Beast (Nov 2022)
    • Initial endpoints working (Jan 2023)
      • User mapping
      • Bucket mapping

History - Research - Phase 2 - Alternate Illyoung Universes

  • Add S3 protocol support to SFTPGo (Aug 2022)
  • Searching for existing S3 server in Go (Sept 2022)
  • Add iRODS backend to Zenko (Oct 2022)
  • Add JuiceFS frontend to iRODS (Nov 2022)
  • Add JuiceFS frontend to SFTPGo (Nov 2022)
  • GarageHQ frontend to iRODS (Jan 2023)
  • In-memory IBM s3mem-go as inspiration (Mar 2023)

History - Research - Phase 3 (Feb-Mar 2023)

  • Multipart Options
    • a. Multiobject - write all parts individually to iRODS, then complete triggers copy/concatenate/whatever
      • pro - relatively simple
      • con - lots of extra policy, could trigger replication to multiple continents (just a config option)
        • requires API plugin for concatenate()
    • b. Store-and-forward - write it all down in the bridge, then send it to iRODS
      • pro - simple, no extra policy
      • con - slow/delayed, need POTENTIALLY HUGE disk
    • c. Efficient store-and-forward - write down / hold non-contiguous parts in bridge - send contiguous parts to iRODS when ready
      • pro - elegant, single write
      • con - more complexity, need biggish disk
      • maybe off the table because a client can re-send the same numbered part and it should overwrite the earlier same part
        • OR... new thread! offset, overwrite, who cares, same size, magic/perfect, just works, don't look at me…
    • d. Store-and-register - write it all down where iRODS can see it, then just register it in iRODS
      • pro - simple, fastest
      • con - just reg policy?, adds dependency on co-visibility of bridge and iRODS
        • cannot continue on failure (incomplete writes)
          • iRODS doesn't know what happened
          • Client has no way to recover

History - Research - Phase 4 (Jun-Nov 2023)

  • Saved Multipart for later
  • Tests passing with:
    • AWS CLI Client
    • Boto3 Python Library
    • MinIO Python Client
    • MinIO CLI Client

Status / Implementation - Architecture

Status / Implementation - Architecture

  • Implemented Endpoints
    • CopyObject
    • DeleteObject
    • GetBucketLocation
    • GetObject
    • GetObjectLockConfiguration
    • HeadObject
    • ListBuckets
    • ListObjectsV2
    • PutObject
  • Next
    • HeadBucket
    • CompleteMultipartUpload
    • CreateMultipartUpload
    • DeleteObjects
    • UploadPart

 

  • Investigating
    • ListObjects
    • GetObjectAcl
    • GetObjectTagging
    • PutObjectAcl
    • PutObjectTagging
    • UploadPartCopy

Status / Implementation - Configuration

{
    // Defines S3 options that affect how the
    // client-facing component of the server behaves.
    "s3_server": {
        // ...
    },

    // Defines iRODS connection information.
    "irods_client": {
        // ...
    }
}

Status / Implementation - Configuration - s3_server

"s3_server": {
    "port": 8080,
    "plugins": {
        // Each key corresponds to a local shared object file
        "static_bucket_resolver": {
            "name": "static_bucket_resolver",
            "mappings": {
                "<bucket_name>": "/path/to/collection"
            }
        },
        "static_authentication_resolver": {
            "name": "static_authentication_resolver",
            "users": {
                // Maps <s3_username> to a specific iRODS user.
                "<s3_username>": {
                    // The iRODS username and secret key
                    "username": "<string>",
                    "secret_key": "<string>"
                }
            }
        }
    },
    "resource": "demoResc",
    "threads": 10,
    "put_object_buffer_size_in_bytes": 8192,
    "get_object_buffer_size_in_bytes": 8192,
    "region": "us-east-1"
},

Status / Implementation - Configuration - irods_client

"irods_client": {
    "host": "<string>",

    "port": 1247,

    "zone": "<string>",

    "proxy_admin_account": {
        "username": "<string>",
        "password": "<string>"
    }
}

Next Steps

  • Testing
    • Use as S3 resource
    • Stress / Performance
  • Multipart Uploads
  • Additional plugins
    • Other bucket mappings
    • Other user mappings

Questions?

Thank you.

SC23 - iRODS S3 API: Presenting iRODS as S3

By iRODS Consortium

SC23 - iRODS S3 API: Presenting iRODS as S3

  • 281