HTTP Caching - What it is, and how it works.

A Tech Talk, 1/7/21

©2021 Copyright - Confidential and Proprietary

©2021 Copyright - Confidential and Proprietary

The fastest requests are the ones that are never made - someone, probably.

- Ryan Kanner

©2021 Copyright - Confidential and Proprietary

What is HTTP Caching?

📝 The Spec

  • Introduced by the w3c in 1999 🎉
  • A set of rules for HTTP clients to follow to enable consistent caching behavior across the web.
  • Primary leverages headers on requests and responses to determine how to interact with the cached data.

🍕The Ingredients

  • Caching headers on requests & responses
  • Spec compliant HTTP client
  • Somewhere to store the cached responses
  • Some HTTP requests 🙃

©2021 Copyright - Confidential and Proprietary

Headers

  • Expires
  • Last-Modified
  • ETag
  • If-None-Match
  • If-Modified-Since
  • Cache-Control
    • Visibility Directives: public, private
    • TTL Directives: max-age, s-maxage
    • Revalidation Directives: no-cache, no-store, must-revalidate
    • Client Override Directives: only-if-cached, max-stale, min-fresh
    • Performance Directives: stale-while-revalidate, stale-if-error

©2021 Copyright - Confidential and Proprietary

Cache Logic Decision Tree

©2021 Copyright - Confidential and Proprietary

Types of Caches

©2021 Copyright - Confidential and Proprietary

Shared Public Cache

# Request
GET /article/123 HTTP/2

# Response
HTTP/2 200
Cache-Control: public

©2021 Copyright - Confidential and Proprietary

Private Cache

# Request
GET /article/123 HTTP/2

# Response
HTTP/2 200
Cache-Control: private

©2021 Copyright - Confidential and Proprietary

Public & Private Caches

©2021 Copyright - Confidential and Proprietary

Caching Strategies

©2021 Copyright - Confidential and Proprietary

Expiration

# Request
GET /article/123 HTTP/2

# Response
HTTP/2 200
Expires: Wed, 5 Jan 2022 07:28:00 GMT
  • Expires header in response defines when the resource should become stale.
  • The client will serve the cached response until the current time is greater than the Expires header value. Then it will re-fetch the resource from the origin server.

©2021 Copyright - Confidential and Proprietary

Expiration (Cache-Control)

# Request
GET /article/123 HTTP/2

# Response
HTTP/2 200
Cache-Control: max-age=3600, s-maxage=600
  • max-age the TTL, in seconds, of the resource.
  • s-maxage like max-age, but only for shared caches. This value should be lower than max-age if it's being used.
  • Once the resource has expired, the client will re-fetch the resource from the origin server and store the new version in the cache before returning the data to the client.

©2021 Copyright - Confidential and Proprietary

Revalidation (ETag)

# Request 1
GET /article/123 HTTP/2

# Response 1
HTTP/2 200
ETag: 987

# Request 2
GET /article/123 HTTP/2
If-None-Match: 987

# Response if the content changed
HTTP/2 200
ETag: 654

# Response if unchanged
HTTP/2 304
  • ETag: A unique validation token for the resource.
  • If-None-Match: The validation token to compare the current version of the resource to on the origin server.
  • If the resource on the origin server has a different validator token than the one passed along in the request, the origin should return the new version of the resource with a 200 response.
  • If it's the same, it should just return a 304 status without a body.

©2021 Copyright - Confidential and Proprietary

Revalidation (Last-Modified)

# Request 1
GET /article/123 HTTP/2

# Response 1
HTTP/2 200
Last-Modified: Fri, 7 Jan 2022 12:00:00 GMT

# Request 2
GET /article/123 HTTP/2
If-Modified-Since: Fri, 7 Jan 2022 12:00:00 GMT

# Response if the content has changed
HTTP/2 200
Last-Modified: Sat, 8 Jan 2022 12:00:00 GMT

# Response if unchanged
HTTP/2 304
  • Last-Modified: The time the resource was last modified.
  • If-Modified-Since: The time the resource in the cache was last modified so the origin server can compare it to the last modified date of the current version of the resource.

©2021 Copyright - Confidential and Proprietary

Heuristic

<img src="puppies.jpg" />

# Response
HTTP/2 200 OK
content-type: image/jpeg
Last-Modified: Fri, 27 Jul 2018 19:06:29 GMT
Date: Fri, 07 Jan 2022 17:22:11 GMT
  • Browsers will automatically cache some assets even if they don't have explicit caching headers.
  • The algorithm for determining the expiration of the resource is as follows:
expirationTime = currentTime + ((headers.Date - headers.Last-Modified) / 10)

©2021 Copyright - Confidential and Proprietary

Fingerprinting

<script src="app.l3kn1df4gln.js" />

# Response
HTTP/2 200 OK
content-type: application/javascript
Cache-Control: public, max-age=31557600
  • Uses a unique key appended to the name of the resource with a high max-age.
  • When a new version of the asset is built, a different fingerprint will be appended to the file name, effectively busting the cache.
  • Cache headers set the max-age to a year in seconds (which is the maximum value allowed)

©2021 Copyright - Confidential and Proprietary

Manipulating Caching Behavior

©2021 Copyright - Confidential and Proprietary

Bypass the cache

# Request
GET /users/me HTTP/2
Cache-Control: no-cache

# Response
HTTP/2 200
Cache-Control: no-store, must-revalidate
  • no-cache Indicates the request must be re-validated with the origin server.
  • no-store Indicates the request must be re-validated, and the response can't be stored in the cache.
  • must-revalidate Prohibits the client from serving a stale value.

©2021 Copyright - Confidential and Proprietary

Client Overrides

# Request
GET /articles/123 HTTP/2
Cache-Control: max-stale=600

# Response
HTTP/2 200
Cache-Control: max-age=3600
# TTL = 600
  • max-stale The maximum age of a record before it becomes stale
  • min-fresh The minimum age of a record before it becomes stale
  • only-if-cached Only serve the record from the cache, even if it's stale
# Request
GET /articles/123 HTTP/2
Cache-Control: min-fresh=3600

# Response
HTTP/2 200
Cache-Control: max-age=600
# TTL = 3600
# Request
GET /articles/123 HTTP/2
Cache-Control: only-if-cached

©2021 Copyright - Confidential and Proprietary

Performance & Fault Tolerance

©2021 Copyright - Confidential and Proprietary

stale-if-error

# Request
GET /articles/123 HTTP/2
Cache-Control: stale-if-error=3600

# Response
HTTP/2 500

©2021 Copyright - Confidential and Proprietary

stale-while-revalidate

# Request
GET /articles/123 HTTP/2

# Response
HTTP/2 200
Cache-Control max-age=600, stale-while-revalidate=60

©2021 Copyright - Confidential and Proprietary

Cache Keys

©2021 Copyright - Confidential and Proprietary

Default Implementation

  • By default, a cache should use the req.host + req.uri of the requested resource as the cache key.
  • Some normalization may happen in some caches to encode special characters.
# Request
GET /articles/123 HTTP/2
Host: nerdwallet.com

# Response
HTTP/2 200
Cache-Control: max-age=600

Some Content
Key Value TTL
nerdwallet.com/articles/123 Some Content 600

©2021 Copyright - Confidential and Proprietary

Using Vary Header

  • Responses should use the Vary header to indicate it's returning a permutation of the response, and the specified header and value should be used to construct the cache key.
# Request
GET /articles/123 HTTP/2
Host: nerdwallet.com
Accept-Language: es-es

# Response
HTTP/2 200
Cache-Control: max-age=600
Vary: Accept-Language

algo de contenido
Key Value TTL
nerdwallet.com/articles/123;Accept-Language=es-es algo de contenido 600

©2021 Copyright - Confidential and Proprietary

Normalization

Normalization should be used to prevent high cardinality values from causing duplicate cache entries.

©2021 Copyright - Confidential and Proprietary

NerdWallet Implementations

  • SRCache at the edge.
    • Full-page cache backed by Redis
    • Opt-in by adding srcache: enabled: on to your URL block
    • Used by
      • legacy WP front-end
      • Some WP & Marketplace API requests
      • A few pages from front-page client
  • query0 dataSource cache
    • API request cache backed by Redis
    • Docs for opting-in
    • Used by
      • WP API Requests
      • Marketplace API Requests
      • Data Set Service
  • Browser caching
    • We use fingerprinting on JS & CSS assets that get served by our CDN
    • Image assets use browser heuristics

©2021 Copyright - Confidential and Proprietary

Takeaways

  • HTTP Caching is an effective method for caching resources in a browser cache or proxy cache.
  • Caching helps provide:
    • Improved speed
    • Reduced server load
    • Reduced operating costs
    • Improved scalability
    • Fault tolerance
  • Go forth and cache all-the-things

©2021 Copyright - Confidential and Proprietary

Resources

©2021 Copyright - Confidential and Proprietary

Thanks for coming! Questions?

HTTP Caching

By Ryan Kanner

HTTP Caching

  • 783