iRODS Client:
AWS Lambda Function for S3 1.0
June 9-12, 2020
iRODS User Group Meeting 2020
Virtual Event
Terrell Russell, Ph.D.
@terrellrussell
Chief Technologist, iRODS Consortium
iRODS Client:
AWS Lambda Function for S3 1.0
iRODS Client: AWS Lambda Function for S3 1.0
Design Goals
- Play nicely with the universe of tools that already know how to write to S3 directly
- Allow those updates within the S3 namespace to smoothly flow into the iRODS Catalog
- Trigger automated data management due to crossing the policy boundary
iRODS Client: AWS Lambda Function for S3 1.0
Considerations
- Lambda can run Python code
- iRODS provides a python client library
Success would be...
- near-real-time, asynchronous, catalog updates for creates/moves/deletes
iRODS Client: AWS Lambda Function for S3 1.0
Files created, renamed, or deleted in S3 appear quickly in iRODS.
iRODS is assumed to have its associated S3 Storage Resource(s) configured with HOST_MODE=cacheless_attached.
You must configure your Lambda to trigger on all ObjectCreated and ObjectRemoved events for a connected S3 bucket.
The iRODS connection information is stored in the AWS Systems Manager > Parameter Store as a JSON object string.
SSL to iRODS is supported by placing a certificate in a relative path within the Lambda package.
Lambda
S3
iRODS Client: AWS Lambda Function for S3 1.0
This Lambda function can be configured to receive events from multiple sources at the same time.
Lambda
S3
S3
S3
If the irods_default_resource is NOT defined in the environment in the Parameter Store, then the Lambda function will derive the name of a target iRODS Resource.
By default, the Lambda function will append _s3 to the incoming bucket name.
iRODS Client: AWS Lambda Function for S3 1.0
The following AWS configurations are supported at this time:
SNS
Lambda
S3
SQS
Lambda
S3
Lambda
S3
iRODS Client: AWS Lambda Function for S3 1.0
Limitations
- S3 is decoupled from the Lambda. A rename is actually a create and a delete message. To iRODS, this becomes a new data object. This means any metadata AVUs associated with the now-deleted data object is lost. Could be remedied with full checksum comparison. Other ideas welcome.
- SQS configuration is limited to batch_size = 1. Operating on more than one message at a time would reduce the cost of running this Lambda at AWS. Unclear how to signal partial success at this time.
Questions?
https://github.com/irods/irods_client_aws_lambda_s3
Thank You!
Pre-release testing environment provided by Bristol Myers Squibb.
UGM 2020 - AWS Lambda Function for S3 1.0
By iRODS Consortium
UGM 2020 - AWS Lambda Function for S3 1.0
- 1,183