Verifying S3 Uploads via Direct Checksum Read from S3 Provider
July X, 2026
iRODS UGM 2026
Barcelona, Spain
Justin James
Application Engineer, iRODS Consortium
Problem Statement
Note: In this talk I will use the term "checksum" as a generic term for either checksums or hashes. I will use the term "hash" only when I am specifically talking only about hashes.
User writes a large data object to S3 request a checksum:
iRODS writes the entire data object to S3 and then reads the entire data object back to calculate and/or verify the checksum.
This is obviously a very time consuming activity and reduces throughput considerably.
Possible Solution
Calculate the checksum on the fly as data is being written.
Problem:
Partial Solution
Have the S3 appliance report the checksum back.
iRODS core modifications:
iRODS S3 resource plugin modifications:
This is sufficient for AWS.
Not all S3 Providers Calculate CRC64/NVME Automatically
Solution
S3 plugin calculates the checksum on the fly and sends it with the upload.
Multipart issues:
With this all in place, we can get the CRC64/NVME directly from MinIO.
Other appliances may support the checksum but may not store it unless it is sent in the upload. This is true for MinIO.
Note: The S3 appliance will validate the checksum sent and reject the put if it is not valid. This itself is not sufficient to verify the file upload as it does not detect corruption between the client and iRODS.
Putting it All Together
A couple of configuration options have been added to the S3 resource context string:
ENABLE_DIRECT_CHECKSUM_READ=1
ENABLE_TRAILING_CHECKSUM_ON_UPLOAD=1
For this to work for multipart uploads, the desired checksum must be CRC64/NVME. This can be set via the client or server defaults.
Questions