Justin James

Applications Engineer

iRODS Consortium

S3 Resource Plugin:

Glacier Support

S3 Resource Plugin:

Glacier Support

July 5-8, 2022

iRODS User Group Meeting 2022

Leuven, Belgium

Overview of S3 Glacier

Glacier storage classes are the archive tiers for S3.
 

  • Defined by Amazon but other vendors may implement it.
    • We have tested it on both Amazon and FujiFilm.
       
  • Transparent on uploads except for a flag for storage class.
     
  • Asynchronous on downloads except for "instant retrieval" storage classes. 

Glacier Support on Object Read

  • Prior to reading an object, a HeadObject operation must be called to ascertain if the object is currently in archive.
     
  • The x-amz-storage-class header is inspected:
    • If it exists and is either GLACIER or DEEP_ARCHIVE, inspect the x-amz-restore header:
      • If x-amz-restore has ongoing-request=true, then a restore has already been scheduled.  Return REPLICA_IS_BEING_STAGED error with message indicating the object is in process of being restored.
      • If x-amz-restore has ongoing-request=false, then the object has already been restored.  Proceed as normal.
      • If x-amz-restore does not exist, the object is in archive.  Call RestoreObject and return REPLICA_IS_BEING_STAGED with message indicating the object is being queued for restoration.
    • If x-amz-storage-class header does not exist or is not GLACIER or DEEP_ARCHIVE, the object can be immediately retrieved.  Proceed as normal.

Restoring an Object from Archive

To support the RestoreObject operation, two new resource context configuration settings have been introduced.
 

  • S3_RESTORATION_TIER - This is the value sent in the <tier> tag when RestoreObject is called.  Values are not case sensitive.  Valid values are Standard, Bulk, and Expedited.  The restoration tier in combination with the storage class defines the length of time needed to complete the restoration.  The following are the restoration times for AWS.

         Note:  RestoreObject is neither necessary nor allowed for objects stored in Glacier_IR.
 

 

  • S3_RESTORATION_DAYS - The number of days the object will be restored.  The default in the S3 plugin is 7.  (According to Amazon, this is overridden if you have the bucket set up with lifecycle configuration.)
Glacier Deep Archive
Expedited 1-5 minutes Not Allowed
Standard (default) ​3-5 hours ​Within 12 hours
Bulk 5-12 hours Within 48 hours

Glacier Support on Write or Copy

A new resource context setting has been added to define the storage class used when uploading data objects:
 

  • S3_STORAGE_CLASS - Valid values are (not case-sensitive):
    • STANDARD - default
    • GLACIER
    • DEEP_ARCHIVE
    • GLACIER_IR - Glacier Instant Retrieval
       

If defined, this setting is sent in the x-amz-storage-class header for PutObject and CopyObject.

  • This header may also have the following values which are either not relevant for Glacier support or have not been implemented:
    • STANDARD_IA  - Standard Infrequent Access
    • ONEZONE_IA - One Zone Infrequent Access
    • INTELLIGENT_TIERING
    • OUTPOST

Changes to libs3

The libs3 library did not have support for Glacier and Deep Archive.  The following changes were made to our fork of this library:
 

  • Implemented the RestoreObject API
     
  • Added the ability to set x-amz-storage-class header on PutObject and CopyObject
     
  • Added the ability to read x-amz-storage-class and x-amz-restore headers from the HeadObject header.
     
  • Will open a pull request to libs3 for these changes.

Summary of New Resource Context Setting Options

Key Valid Values Description
S3_STORAGE_CLASS STANDARD
GLACIER
DEEP_ARCHIVE
GLACIER_IR
The storage class assigned to the data object.  Sent on PutObject and CopyObject.
S3_RESTORATION_TIER STANDARD
BULK
EXPEDITED
When restoring an object, the restoration tier used.
S3_RESTORATION_DAYS Any positive integer The number of days the object will be restored

Example Glacier Setup and File Retrieval

Create a resource that places files in Glacier, performs expedited restorations , and restores for 1 day. 

$ iadmin mkresc s3resc s3 `hostname`:/justinkylejames-irods1/amazons3resc "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=us-east-1;S3_PROTO=HTTP;HOST_MODE=cacheless_attached;S3_STORAGE_CLASS=Glacier;S3_RESTORATION_TIER=Expedited;S3_RESTORATION_DAYS=1"
Creating resource:
Name:           "s3resc"
Type:           "s3"
Host:           "ce61bbc3beec"
Path:           "/justinkylejames-irods1/amazons3resc"
Context:        "S3_DEFAULT_HOSTNAME=s3.amazonaws.com;S3_AUTH_FILE=/var/lib/irods/amazon.keypair;S3_REGIONNAME=us-east-1;S3_PROTO=HTTP;HOST_MODE=cacheless_attached;S3_STORAGE_CLASS=Glacier;S3_RESTORATION_TIER=Expedited;S3_RESTORATION_DAYS=1"

Create and put a file

$ echo test123 > test.txt
$ iput -R s3resc test.txt

Example Glacier Setup and File Retrieval

Now try to get it again.  Note the slightly different error message.

$ iget test.txt -
remote addresses: 172.17.0.2 ERROR: getUtil: get error for - status = -721000 REPLICA_IS_BEING_STAGED
Level 0: [-]    /github/irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:133:irods::error irods::experimental::io::s3_transport::handle_glacier_status(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, irods::experimental::io::s3_transport::object_s3_status, const std::string &) :  status [REPLICA_IS_BEING_STAGED]  errno [] -- message [Object is in GLACIER and is currently being restored.  Try again later.]

Wait a few minutes and retrieve the file again:

$ iget test.txt - 
test123
$ iget test.txt -
remote addresses: 172.17.0.2 ERROR: getUtil: get error for - status = -721000 REPLICA_IS_BEING_STAGED
Level 0: [-]    /github/irods_resource_plugin_s3/s3/s3_transport/src/s3_transport.cpp:208:irods::error irods::experimental::io::s3_transport::restore_s3_object(const std::string &, libs3_types::bucket_context &, const unsigned int, const std::string &, const std::string &) :  status [REPLICA_IS_BEING_STAGED]  errno [] -- message [Object is in GLACIER and has been queued for restoration.  Try again later.]

Try to get the object.

Release Status

Glacier support has been added to the S3 plugin and has been partially released.
 

  • Restoration from Glacier was added in 4.2.11.0 and included in 4.3.0.0.
     
  • To be released soon in 4.3.0.1:
    • Support for setting the storage class on PutObject and CopyObject​ 
    • ​Support for Deep Archive for puts/gets/copies

What's Next:

Release support for Deep Archive and setting the storage class on PutObject and CopyObject.
 

Support for the intelligent tiering storage class should be trivial but has not been implemented or tested.

We could write a rule to read metadata on an atomic put and select the storage class dynamically for object level control.

UGM 2022 - S3 Resource Plugin: Glacier Support

By iRODS Consortium

UGM 2022 - S3 Resource Plugin: Glacier Support

  • 237