Implementing
IIIF By The Numbers
Kevin S. Clarke Anthony Vuong
Digital Library Software Developer Development Support Engineer
<ksclarke@library.ucla.edu> <avuong@library.ucla.edu>
A Tale of Two Architectures
Sinai Palimpsests Project
- Uses a "Level 0" IIIF compatible tile server
Californica/Ursus (Hyrax/Blacklight)
- Cantaloupe "Level 2" IIIF compatible image server
Servers
Server-oriented architecture treats individual applications (or websites) as the primary consideration.
There may be multiple IIIF servers, each one selected and configured to meet the needs of its application.
Services
Service-oriented architecture provides a functionality to a variety of applications.
"IIIF as a service" means instead of maintaining multiple architectures for serving IIIF images, we build one that meets the needs of multiple applications.
How Do We Make Decisions?
A Metrics Based Approach
- Share and reuse work with and from our colleague
- Talked with The Getty (also doing IIIF measurements)
- Build / use tools that can help test multiple factors
- docker-cantaloupe works locally or in the cloud
- Used `time`, Locust, CloudWatch, and AWS CLI tools
Image Conversion
Image Delivery
Vertical vs. Horizontal Scaling
Local VM
8 Cores E5-2630 v3 @ 2.40GHz
8 GB memory DDR4 2133MHZ
Compiled version of Kakadu
AWS Lambda
2 Core Lambda function
1024 MB memory
Compiled version of Kakadu
Image Conversion
Our Local Process
- Run script for TIFF to JP2 conversion
- Read TIFFs off our NetApp file system
- Have Kakadu convert them into JP2s
- Upload the JP2s to Cantaloupe's S3 source bucket
- Time how long it takes to process 1000 images
Our Lambda Process
- Upload TIFFs into an S3 bucket from NetApp file system
- Lambda function is triggered by the bucket event
- Kakadu in Lambda function converts TIFF into JP2
- Lambda function stores JP2 in Cantaloupe's S3 bucket
- Get time it took to process 1000 images from Cloudwatch
Some Questions
- What happens when we give our local VM 16 cores? 32?
- How far can we vertically scale using local resources?
- How large of an image can we convert on AWS Lambda?
- Its file system is (currently) limited to 512 MB
- Its memory is (currently) limited to 3008 MB
Local VMWare
AWS Fargate
(simple)
Image Delivery
AWS Fargate
(scaled)
VMWare
- 2 VMs
- 1 Docker container runner in each VM
- Specs of container
- 8GB Memory
- 6 Cores
- Total = 16GB Memory / 12 CPU cores
- Cantaloupe 4.1.1
AWS Fargate
- 3 Fargate Containers
- Specs of container
- 8GB Memory
- 4 Cores(Fargate max)
- Total = 24GB Memory / 12 CPU cores
- Cantaloupe 4.1.1
AWS Fargate(Scaled)
- 10 Fargate Containers
- 8GB / 4CPU each container
- Aggregate specs
- 80GB Memory
- 40 CPUs
- Cantaloupe 4.1.1
Delivery Test Case #1
- Single Image Fixed Test
- Large Image(110-130MB)
- Medium Image(50-60MB)
- PCT:50 / Full Image Request
- IIIF URI used
- /full/pct:50/0/default.jpg?cache=false
- /full/full/0/default.jpg?cache=false
Large Full Image Results
Medium Full Image Results
Large Image Results (50%)
Medium Image Results (50%)
Large Full Multi-Region
Large 50% Multi-Region
Medium Full Multi-Region
Medium 50% Multi-Region
Delivery Test Case #2
- Simulated workload with concurrent users
- Various regions/tiles of an image
- Large Image(110-130MB)
- Medium Image(50-60MB)
- 20, 50, 100, 200 concurrent users
- 5-15 seconds wait per user request
- 1000+ URLs, picked at random
- Locust load test left on for 5 minutes
Random - Large TIFF Results
Random Large TIFF Unskewed Results
Random - Large Lossless Results
Random Large Lossless Unskewed Results
Random - Large Lossy Results
Random Large Lossy Unskewed Results
Random - Medium TIFF Results
Random Medium TIFF Unskewed Results
Random - Medium Lossless Results
Random Medium Lossless Unskewed Results
Random - Medium Lossy Results
Discoveries
- Generating JPEG derivatives from full TIFF requests are faster than using Kakadu's Native Processor for Lossless and Lossy images
- S3 GET speeds appear to be limited to gigabit at the most. Averages from 30-50MB/s vs 400MB/s on local netapp storage
- Makes a huge difference on larger images, not so much smaller images.
- Wait times could cause image downloads to be stacked and bog down container resources
- Current assumption is that network bandwidth is the bottleneck rather than compute
- Scaling out the Fargate containers allows for more parallel requests to be executed and balance the load across multiple resources
What do we do now?
- Decide on Lossy, Lossless, or TIFF as sources
- Decide on using on-premise hardware or AWS
- For full image pixel requests, latency becomes an issue. Delivery times can vary between on-premise vs AWS
- Do we want to serve full image pixels?
- Launch and experiment with a production service
- Gather real work-load use cases
- Experiment reducing latency from other countries using AWS CloudFront
- Make containers more "production" ready
- More detailed monitoring needed!
- Automate all the things!
UCLA Library
Kevin S. Clarke Anthony Vuong
Digital Library Software Developer Development Support Engineer
<ksclarke@library.ucla.edu> <avuong@library.ucla.edu>
IIIF By The Numbers
By Kevin S. Clarke
IIIF By The Numbers
- 505