Cacheless S3 Resource
June 28, 2019
iRODS User Group Meeting 2019
Utrecht, Netherlands
Justin James
iRODS Consortium
Cacheless S3 Resource
Introduction (Legacy Operation)
The legacy S3 plugin must be used in conjunction with a compound resource and a unixfilesystem cache resource.
The following is a sample hierarchy of the S3 plugin.
s3compound:compound ├── s3archive:s3 └── s3cache:unixfilesystem
This required the iRODS administrator to create a cache cleanup rule.
The S3 plugin itself only implemented a few operations:
All of the other operations were handled by the cache resource.
Introduction (New Modes of Operation)
The new plugin now supports three operating modes
This mode is set using the HOST_MODE parameter in the resource context string.
If the HOST_MODE is not set, the default is archive_attached, which operates as the legacy S3 plugin.
Archive | Cacheless | |
---|---|---|
Attached | archive_attached (default) |
cacheless_attached |
Detached | N/A | cacheless_detached |
Note that "archive_detached" is not a valid entry.
Introduction (Archive vs Cacheless)
Introduction (Attached vs Detached)
Creating a Cacheless S3 Resource
iadmin mkresc s3resc s3 `hostname`:/irods-bucket/irods/Vault "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/s3.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached"
Creating a cacheless S3 resource is very similar to creating a legacy/archive S3 resource.
As stated previously, the only differences is that the cacheless S3 may be a standalone resource and the HOST_MODE must be set to either "cacheless_attached" or "cacheless_detached".
The following is an example of creating a cacheless/attached S3 resource to Amazon S3.
Implementation Details
To implement the cacheless S3 resource, we implemented many of the operations that the archive S3 resource had not implemented. These include:
Implementation Details
As a starting point we started with S3FS which is an open source FUSE mount point for S3.
Implementation Details
The next step was to translate the FUSE operations to iRODS resource plugin operations.
The iRODS resource plugin operations follow POSIX semantics instead of FUSE semantics. To implement this we need to store additional state information about every open file:
Implementation Details (Dealing with Global Variables)
S3FS uses a lot of global variables for S3 configuration, internal data structures, etc.
Implementation Details (Download Peformance)
Problem:
Goals:
Implementation Details (Download Peformance)
Solution:
Implementation Details (Download Peformance)
Results:
Implementation Details (Upload Peformance)
Problem:
Solution:
Implementation Details (Upload Peformance)
Results:
Demonstration of the S3 Plugin
We will create a cacheless S3 resource. First, some assumptions:
First let's create a cacheless S3 plugin
iadmin mkresc news3resc s3 `hostname`:/justinkylejames-irods1/irods/Vault "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/news3resc.keypair;S3_REGIONNAME=us-east-1;S3_RETRY_COUNT=1;S3_WAIT_TIME_SEC=3;S3_PROTO=HTTP;ARCHIVE_NAMING_POLICY=consistent;HOST_MODE=cacheless_attached"
Demonstration of the S3 Plugin
Create a simple test file and put it to the S3 resource:
$ echo 'this is a test file' > test.txt
$ iput -R news3resc test.txt
Using the S3 commands, check that the file exists:
$ aws s3 ls s3://justinkylejames-irods1/irods/Vault/home/rods/ 2019-02-18 14:55:44 20 test.txt
Get the file using iRODS:
$ iget test.txt - this is a test file
Demonstration of the S3 Plugin
Rename the data object and check that it has been renamed in S3:
$ imv test.txt newname.txt
$ ils -L
/tempZone/home/rods:
rods 0 news3resc 20 2019-02-18.14:55 & newname.txt
generic /justinkylejames-irods1/irods/Vault/home/rods/newname.txt
$ aws s3 ls s3://justinkylejames-irods1/irods/Vault/home/rods/
2019-02-18 15:23:24 20 newname.txt
Remove the file:
$ irm -f newname.txt
$ ils
/tempZone/home/rods:
$ aws s3 ls s3://justinkylejames-irods1/irods/Vault/home/rods/
Demonstration of the S3 Plugin
Now for a more comprehensive test, I've created a file that is large enough to use the parallel file transfers. I have already created a 64M file that contains data that I will put into iRODS.
$ iput -R news3resc 64Mfile
Get the file:
$ iget 64Mfile 64Mfile2 -f
Compare the file we retrieved (64Mfile2) to the original file (64Mfile):
$ diff 64Mfile 64Mfile2
$ cksum 64Mfile 64Mfile2
1941261876 67108864 64Mfile
1941261876 67108864 64Mfile2
Next Steps
The cacheless S3 plugin has passed all CI tests. There are still some improvements to be made.
Questions?