iRODS Client:

AWS Lambda Function for S3 1.0

June 9-12, 2020

iRODS User Group Meeting 2020

Virtual Event

Terrell Russell, Ph.D.

@terrellrussell

Chief Technologist, iRODS Consortium

iRODS Client:

AWS Lambda Function for S3 1.0

iRODS Client: AWS Lambda Function for S3 1.0

Design Goals

 

  • Play nicely with the universe of tools that already know how to write to S3 directly

 

  • Allow those updates within the S3 namespace to smoothly flow into the iRODS Catalog

 

  • Trigger automated data management due to crossing the policy boundary

iRODS Client: AWS Lambda Function for S3 1.0

Considerations

  • Lambda can run Python code
  • iRODS provides a python client library

 

Success would be...

  • near-real-time, asynchronous, catalog updates for creates/moves/deletes

 

iRODS Client: AWS Lambda Function for S3 1.0

Files created, renamed, or deleted in S3 appear quickly in iRODS.

 

iRODS is assumed to have its associated S3 Storage Resource(s) configured with HOST_MODE=cacheless_attached.

 

You must configure your Lambda to trigger on all ObjectCreated and ObjectRemoved events for a connected S3 bucket.

 

The iRODS connection information is stored in the AWS Systems Manager > Parameter Store as a JSON object string.

 

SSL to iRODS is supported by placing a certificate in a relative path within the Lambda package.

Lambda

S3

iRODS Client: AWS Lambda Function for S3 1.0

This Lambda function can be configured to receive events from multiple sources at the same time.

Lambda

S3

S3

S3

If the irods_default_resource is NOT defined in the environment in the Parameter Store, then the Lambda function will derive the name of a target iRODS Resource.

 

By default, the Lambda function will append _s3 to the incoming bucket name.

iRODS Client: AWS Lambda Function for S3 1.0

The following AWS configurations are supported at this time:

SNS

Lambda

S3

SQS

Lambda

S3

Lambda

S3

iRODS Client: AWS Lambda Function for S3 1.0

Limitations

 

  • S3 is decoupled from the Lambda.  A rename is actually a create and a delete message.  To iRODS, this becomes a new data object.  This means any metadata AVUs associated with the now-deleted data object is lost.  Could be remedied with full checksum comparison.  Other ideas welcome.

 

  • SQS configuration is limited to batch_size = 1.  Operating on more than one message at a time would reduce the cost of running this Lambda at AWS.  Unclear how to signal partial success at this time.

Questions?

 

https://github.com/irods/irods_client_aws_lambda_s3

 

 

Thank You!

 

 

Pre-release testing environment provided by Bristol Myers Squibb.

UGM 2020 - AWS Lambda Function for S3 1.0

By iRODS Consortium

UGM 2020 - AWS Lambda Function for S3 1.0

  • 1,183