HTTP Caching - What it is, and how it works.
A Tech Talk, 1/7/21
©2021 Copyright - Confidential and Proprietary
©2021 Copyright - Confidential and Proprietary
The fastest requests are the ones that are never made - someone, probably.
- Ryan Kanner
©2021 Copyright - Confidential and Proprietary
What is HTTP Caching?
📝 The Spec
- Introduced by the w3c in 1999 🎉
- A set of rules for HTTP clients to follow to enable consistent caching behavior across the web.
- Primary leverages headers on requests and responses to determine how to interact with the cached data.
🍕The Ingredients
- Caching headers on requests & responses
- Spec compliant HTTP client
- Somewhere to store the cached responses
- Some HTTP requests 🙃
©2021 Copyright - Confidential and Proprietary
Headers
- Expires
- Last-Modified
- ETag
- If-None-Match
- If-Modified-Since
- Cache-Control
- Visibility Directives: public, private
- TTL Directives: max-age, s-maxage
- Revalidation Directives: no-cache, no-store, must-revalidate
- Client Override Directives: only-if-cached, max-stale, min-fresh
- Performance Directives: stale-while-revalidate, stale-if-error
©2021 Copyright - Confidential and Proprietary
Cache Logic Decision Tree
©2021 Copyright - Confidential and Proprietary
Types of Caches
©2021 Copyright - Confidential and Proprietary
Shared Public Cache
# Request
GET /article/123 HTTP/2
# Response
HTTP/2 200
Cache-Control: public
©2021 Copyright - Confidential and Proprietary
Private Cache
# Request
GET /article/123 HTTP/2
# Response
HTTP/2 200
Cache-Control: private
©2021 Copyright - Confidential and Proprietary
Public & Private Caches
©2021 Copyright - Confidential and Proprietary
Caching Strategies
©2021 Copyright - Confidential and Proprietary
Expiration
# Request
GET /article/123 HTTP/2
# Response
HTTP/2 200
Expires: Wed, 5 Jan 2022 07:28:00 GMT
- Expires header in response defines when the resource should become stale.
- The client will serve the cached response until the current time is greater than the Expires header value. Then it will re-fetch the resource from the origin server.
©2021 Copyright - Confidential and Proprietary
Expiration (Cache-Control)
# Request
GET /article/123 HTTP/2
# Response
HTTP/2 200
Cache-Control: max-age=3600, s-maxage=600
- max-age the TTL, in seconds, of the resource.
- s-maxage like max-age, but only for shared caches. This value should be lower than max-age if it's being used.
- Once the resource has expired, the client will re-fetch the resource from the origin server and store the new version in the cache before returning the data to the client.
©2021 Copyright - Confidential and Proprietary
Revalidation (ETag)
# Request 1
GET /article/123 HTTP/2
# Response 1
HTTP/2 200
ETag: 987
# Request 2
GET /article/123 HTTP/2
If-None-Match: 987
# Response if the content changed
HTTP/2 200
ETag: 654
# Response if unchanged
HTTP/2 304
- ETag: A unique validation token for the resource.
- If-None-Match: The validation token to compare the current version of the resource to on the origin server.
- If the resource on the origin server has a different validator token than the one passed along in the request, the origin should return the new version of the resource with a 200 response.
- If it's the same, it should just return a 304 status without a body.
©2021 Copyright - Confidential and Proprietary
Revalidation (Last-Modified)
# Request 1
GET /article/123 HTTP/2
# Response 1
HTTP/2 200
Last-Modified: Fri, 7 Jan 2022 12:00:00 GMT
# Request 2
GET /article/123 HTTP/2
If-Modified-Since: Fri, 7 Jan 2022 12:00:00 GMT
# Response if the content has changed
HTTP/2 200
Last-Modified: Sat, 8 Jan 2022 12:00:00 GMT
# Response if unchanged
HTTP/2 304
- Last-Modified: The time the resource was last modified.
- If-Modified-Since: The time the resource in the cache was last modified so the origin server can compare it to the last modified date of the current version of the resource.
©2021 Copyright - Confidential and Proprietary
Heuristic
<img src="puppies.jpg" />
# Response
HTTP/2 200 OK
content-type: image/jpeg
Last-Modified: Fri, 27 Jul 2018 19:06:29 GMT
Date: Fri, 07 Jan 2022 17:22:11 GMT
- Browsers will automatically cache some assets even if they don't have explicit caching headers.
- The algorithm for determining the expiration of the resource is as follows:
expirationTime = currentTime + ((headers.Date - headers.Last-Modified) / 10)
©2021 Copyright - Confidential and Proprietary
Fingerprinting
<script src="app.l3kn1df4gln.js" />
# Response
HTTP/2 200 OK
content-type: application/javascript
Cache-Control: public, max-age=31557600
- Uses a unique key appended to the name of the resource with a high max-age.
- When a new version of the asset is built, a different fingerprint will be appended to the file name, effectively busting the cache.
- Cache headers set the max-age to a year in seconds (which is the maximum value allowed)
©2021 Copyright - Confidential and Proprietary
Manipulating Caching Behavior
©2021 Copyright - Confidential and Proprietary
Bypass the cache
# Request
GET /users/me HTTP/2
Cache-Control: no-cache
# Response
HTTP/2 200
Cache-Control: no-store, must-revalidate
- no-cache Indicates the request must be re-validated with the origin server.
- no-store Indicates the request must be re-validated, and the response can't be stored in the cache.
- must-revalidate Prohibits the client from serving a stale value.
©2021 Copyright - Confidential and Proprietary
Client Overrides
# Request
GET /articles/123 HTTP/2
Cache-Control: max-stale=600
# Response
HTTP/2 200
Cache-Control: max-age=3600
# TTL = 600
- max-stale The maximum age of a record before it becomes stale
- min-fresh The minimum age of a record before it becomes stale
- only-if-cached Only serve the record from the cache, even if it's stale
# Request
GET /articles/123 HTTP/2
Cache-Control: min-fresh=3600
# Response
HTTP/2 200
Cache-Control: max-age=600
# TTL = 3600
# Request
GET /articles/123 HTTP/2
Cache-Control: only-if-cached
©2021 Copyright - Confidential and Proprietary
Performance & Fault Tolerance
©2021 Copyright - Confidential and Proprietary
stale-if-error
# Request
GET /articles/123 HTTP/2
Cache-Control: stale-if-error=3600
# Response
HTTP/2 500
©2021 Copyright - Confidential and Proprietary
stale-while-revalidate
# Request
GET /articles/123 HTTP/2
# Response
HTTP/2 200
Cache-Control max-age=600, stale-while-revalidate=60
©2021 Copyright - Confidential and Proprietary
Cache Keys
©2021 Copyright - Confidential and Proprietary
Default Implementation
- By default, a cache should use the req.host + req.uri of the requested resource as the cache key.
- Some normalization may happen in some caches to encode special characters.
# Request
GET /articles/123 HTTP/2
Host: nerdwallet.com
# Response
HTTP/2 200
Cache-Control: max-age=600
Some Content
Key | Value | TTL |
---|---|---|
nerdwallet.com/articles/123 | Some Content | 600 |
©2021 Copyright - Confidential and Proprietary
Using Vary Header
- Responses should use the Vary header to indicate it's returning a permutation of the response, and the specified header and value should be used to construct the cache key.
# Request
GET /articles/123 HTTP/2
Host: nerdwallet.com
Accept-Language: es-es
# Response
HTTP/2 200
Cache-Control: max-age=600
Vary: Accept-Language
algo de contenido
Key | Value | TTL |
---|---|---|
nerdwallet.com/articles/123;Accept-Language=es-es | algo de contenido | 600 |
©2021 Copyright - Confidential and Proprietary
Normalization
Normalization should be used to prevent high cardinality values from causing duplicate cache entries.
©2021 Copyright - Confidential and Proprietary
NerdWallet Implementations
- SRCache at the edge.
- Full-page cache backed by Redis
- Opt-in by adding srcache: enabled: on to your URL block
- Used by
- legacy WP front-end
- Some WP & Marketplace API requests
- A few pages from front-page client
- query0 dataSource cache
- API request cache backed by Redis
- Docs for opting-in
- Used by
- WP API Requests
- Marketplace API Requests
- Data Set Service
- Browser caching
- We use fingerprinting on JS & CSS assets that get served by our CDN
- Image assets use browser heuristics
©2021 Copyright - Confidential and Proprietary
Takeaways
- HTTP Caching is an effective method for caching resources in a browser cache or proxy cache.
- Caching helps provide:
- Improved speed
- Reduced server load
- Reduced operating costs
- Improved scalability
- Fault tolerance
- Go forth and cache all-the-things
©2021 Copyright - Confidential and Proprietary
Resources
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control
- https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
- https://odino.org/http-cache-101-scaling-the-web/
- https://odino.org/rest-better-http-cache/
- https://www.keycdn.com/blog/http-cache-headers
- https://www.freecodecamp.org/news/an-in-depth-introduction-to-http-caching-cache-control-vary/
- https://www.mnot.net/blog/2017/03/16/browser-caching
- https://web.dev/http-cache/#defining-optimal-cache-control-policy
- https://www.fastly.com/blog/understanding-vary-header-browser
- https://github.com/kornelski/http-cache-semantics
©2021 Copyright - Confidential and Proprietary
Thanks for coming! Questions?
HTTP Caching
By Ryan Kanner
HTTP Caching
- 783