Cacheless S3 Resource

February 20, 2019

Renaissance Computing Institute

UNC-Chapel Hill

Justin James

iRODS Consortium

Cacheless S3 Resource

Introduction (Legacy Operation)

The legacy S3 plugin must be use in conjunction with a compound resource and a unixfilesystem cache resource.

The following is a sample hierarchy of the S3 plugin.

s3compound:compound
├── s3archive:s3
└── s3cache:unixfilesystem

This required the iRODS administrator to create a cache cleanup rule.

 

The S3 plugin itself only implemented a few operations:

  • irods::RESOURCE_OP_UNLINK
  • irods::RESOURCE_OP_STAT
  • irods::RESOURCE_OP_RENAME
  • irods::RESOURCE_OP_STAGETOCACHE
  • irods::RESOURCE_OP_SYNCTOARCH

 

All of the other operations were handled by the cache resource.

Introduction (New Modes of Operation)

The newest version of the S3 resource plugin will now support three operating modes:

This mode is set using the HOST_MODE parameter in the S3 resource's context string.

 

If the HOST_MODE is not set, the existing default behavior will be used, now known as "archive_attached".

 

Note that "archive_detached" is not a valid mode.  

Archive Cacheless
Attached archive_attached
(default)
cacheless_attached
(demonstrated today)
Detached N/A cacheless_detached
(coming soon)

Introduction (Archive vs Cacheless)

  • Archive - The S3 resource acts in the archive role behind a compound resource.
    • Requires a sibling cache resource to provide POSIX semantics.
    • Must be attached to a specific iRODS server.

 

 

  • Cacheless - The S3 resource can be standalone.
    • May be detached from any specific iRODS server (see next slide).
    • The S3 plugin provides POSIX semantics without using a cache resource and without requiring any cache management.

 

Introduction (Attached vs Detached)

  • Attached - Only the server that is defined as the host in the resource configuration will serve the request. 

 

 

  • Detached - All iRODS servers may serve a request for an object.  This is appropriate if all servers have connectivity to the S3-based backend.

 

Creating a Cacheless S3 Resource

iadmin mkresc s3resc s3 `hostname`:/irods-bucket/irods/Vault "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/s3.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached"

Implementation Details

Implementation Details

We started with a project called S3FS which is an open source FUSE mount point for S3.  Much of the integration was translations of FUSE operations to iRODS resource operations.


However, the translation was not seamless.

 

  • Linux handles multiple opens on a single file which is transparent to FUSE.
  • Linux handles file offsets which is transparent to FUSE.  
  • The S3 plugin needed to be adjusted to handle multiple simultaneous writes.

Demonstration of the S3 Plugin

iadmin mkresc news3resc s3 `hostname`:/justinkylejames-irods1/irods/Vault "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/news3resc.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached"

Demonstration of the S3 Plugin

$ echo 'this is a test file' > test.txt
$ iput -R news3resc test.txt
$ aws s3 ls s3://justinkylejames-irods1/irods/Vault/home/rods/
2019-02-18 14:55:44         20 test.txt
$ iget test.txt -
this is a test file

Demonstration of the S3 Plugin

$ imv test.txt newname.txt
$ ils -L
/tempZone/home/rods:
  rods              0 news3resc           20 2019-02-18.14:55 & newname.txt
        generic    /justinkylejames-irods1/irods/Vault/home/rods/newname.txt
$ aws s3 ls s3://justinkylejames-irods1/irods/Vault/home/rods/
2019-02-18 15:23:24         20 newname.txt
$ irm -f newname.txt
$ ils
/tempZone/home/rods:
$ aws s3 ls s3://justinkylejames-irods1/irods/Vault/home/rods/

Demonstration of the S3 Plugin

$ iput -R news3resc 64Mfile
$ iget -f 64Mfile 64Mfile2

$ diff 64Mfile 64Mfile2

$ cksum 64Mfile 64Mfile2
1941261876 67108864 64Mfile
1941261876 67108864 64Mfile2

Next Steps

TRiRODS February 2019 - Cacheless S3 Resource

By iRODS Consortium

TRiRODS February 2019 - Cacheless S3 Resource

TRiRODS February 2019 - Cacheless S3 Resource

  • 1,238