Cache Invalidation,
Solved
Asmir Mustafic
GetYourGuide


Me
Asmir Mustafic
Caching
There are only two hard things in Computer Science: cache invalidation and naming things.

Cache
Contento
Headless CMS
webhook
get
get / set
get




legal, email templates, help center and much more

A few info
- Java service
- ~200 req/s (spikes of 1k req/s)
- TTL set to 1 week
- Latency issues when an entry expires due to TTL
- Complex code to invalidate the cache on webhooks
- Don't want to increase TTL due to costs
- Everything is stored in memory

Cache Invalidation
Why is hard?

key | value |
---|---|
1 | {id:1, slug: "foo", loc: 1} |
2 | {id:2, slug: "bar", loc: 1} |
@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id);
cache.put(id, user)
}
return user;
}

key | value |
---|---|
1 | {id:1, slug: "foo", loc: 1} |
2 | {id:2, slug: "bar", loc: 1} |
fun onUserUpdate(user) {
cache.delete(user.id)
}

Variants

key | value |
---|---|
1 | {id:1, mail: "foo@a", loc: 1} |
2 | {id:2, mail: "bar@a", loc: 1} |
foo@a | {id:1, mail: "foo@a", loc: 1} |
@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id);
cache.put(id, user)
}
return user;
}
@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
user = cache.get(mail)
if (user == null) {
user = repository.getUserByMail(mail);
cache.put(mail, user)
}
return user;
}

fun onUserUpdate(user) {
cache.delete(user.id)
cache.delete(user.mail)
}

key | value |
---|---|
1 | {id:1, mail: "foo@a", loc: 1} |
2 | {id:2, mail: "bar@a", loc: 1} |
foo@a | {id:1, mail: "foo@a", loc: 1} |
Multiple Variants

key | value |
---|---|
1 | {id:1, mail: "foo@a", loc: 1} |
2 | {id:2, mail: "bar@a", loc: 1} |
foo@a | {id:1, mail: "foo@a", loc: 1} |
FoO@a | {id:1, mail: "foo@a", loc: 1} |
@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id);
cache.put(id, user)
}
return user;
}
@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
user = cache.get(mail)
if (user == null) {
user = repository.getUserByMail(mail);
cache.put(mail, user)
}
return user;
}
// repository code
fun getUserByMail(mail) {
mail = mail.lower();
//... some sql query
return user;
}

fun onUserUpdate(user) {
cache.delete(user.id)
cache.delete(user.mail)
// how to invalidate all the permutations of
// the mail ?
}

key | value |
---|---|
1 | {id:1, mail: "foo@a", loc: 1} |
2 | {id:2, mail: "bar@a", loc: 1} |
foo@a | {id:1, mail: "foo@a", loc: 1} |
FoO@a | {id:1, mail: "foo@a", loc: 1} |
Embedded resources

key | value |
---|---|
1 | {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} |
@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id);
cache.put(id, user)
}
return user;
}
If we rename "Rome" to "Roma"
we need to invalidate all the users from Rome

fun onLocationUpdate(location) {
// find all users with that location and invalidate
users = usersApi.findAllWithLocationId(location.id)
for (user in users) {
cache.delete(user.id)
cache.delete(user.mail)
cache.delete("L" + user.loc)
// more to come as the product grows
}
// find all trups with that location and invalidate
// find all the bookigns with that location and invalidate
// ....more
}
key | value |
---|---|
1 | {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} |

Lists

key | value |
---|---|
L1 | [{id:1, slug: "foo", loc: 1}, {id:2, slug: "bar", loc: 1} ] |
L2 | [] |
@Get("/users-by-location/{id}")
fun getUsersByLocation(id) {
users = cache.get("L" + id)
if (user == null) {
users = repository.getUsersByLocation(id)
cache.put("L" + id, users)
}
return users;
}

fun onUserUpdate(user) {
cache.delete(user.id)
cache.delete(user.mail)
// works only when user is added to collection
cache.delete("L" + user.loc)
// what is user changes location ? (we need the previous user state...)
}
key | value |
---|---|
L1 | [{id:1, slug: "foo", loc: 1}, {id:2, slug: "bar", loc: 1} ] |
L2 | [] |

Things are getting complicated...

Cache tags to resque


Cache Tagging
HTTP Cache Tags...
Common in CDNs
key | value |
---|---|
1 | ... |
2 | ... |
foo | ... |
FoO | ... |

key | value |
---|---|
1 | ... |
2 | ... |
foo | ... |
FoO | ... |
key | tag |
---|---|
1 | id-1 |
key | tag |
---|---|
2 | id-2 |
... |
key | tag |
---|---|
foo | id-1 |
key | tag |
---|---|
FoO | id-1 |

In practice

@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id)
cache.put(id, user, getUserTags(user))
}
return user;
}
fun getUserTags(user) {
return ["id-" + user.id];
}
key | value | tags |
---|---|---|
1 | ... | id-1 |
2 | ... | id-2 |

fun onUserUpdate(user) {
cache.deleteByTag("id-" + user.id)
}
key | value | tags |
---|---|---|
1 | ... | id-1 |
2 | ... | id-2 |

Variants
and
Multiple variants

@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id)
cache.put(id, user, getUserTags(user))
}
return user;
}
@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
user = cache.get(mail)
if (user == null) {
user = repository.getUserByMail(mail)
cache.put(mail, user, getUserTags(user))
}
return user;
}
fun getUserTags(user) {
return ["id-" + user.id];
}
key | value | tags |
---|---|---|
1 | ... | id-1 |
2 | ... | id-2 |
foo@a | ... | id-1 |
FoO@a | ... | id-1 |

fun onUserUpdate(user) {
cache.deleteByTag("id-" + user.id)
}
key | value | tags |
---|---|---|
1 | ... | id-1 |
2 | ... | id-2 |
foo | ... | id-1 |
FoO | ... | id-1 |

Embedded resources

key | value | tags |
---|---|---|
1 | {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} | id-1, loc-1 |
2 | {id:2, slug: "bar", loc: {id: 1, name : "Rome"}} | id-2, loc-1 |
@Get("/user/{id}")
fun getUserById(id) {
user = cache.get(id)
if (user == null) {
user = repository.getUserById(id);
cache.put(id, user, getUserTags(user))
}
return user;
}
fun getUserTags(user) {
return ["id-" + user.id, "loc-" + user.loc.id];
}

fun onLocationUpdate(location) {
cache.deleteByTag("loc-" + location.id)
}
key | value | tags |
---|---|---|
1 | {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} | id-1, loc-1 |
2 | {id:2, slug: "bar", loc: {id: 1, name : "Rome"}} | id-2, loc-1 |

Lists

key | value | tag |
---|---|---|
L1 | [{id:1, slug: "foo"}, {id:2, slug: "bar"} ] |
loc-1, id-1, id-2 |
L2 | [] | loc-2 |
@Get("/users-by-location/{id}")
fun getUsersByLocation(id) {
users = cache.get("L" + id)
if (user == null) {
users = repository.getUsersByLocation(id)
cache.put("L" + id, users, ["loc-" + id] + getUsersTags(users));
}
return users;
}
fun getUsersTags(users) {
tags = [];
for (user in users) {
tags.concat(getUserTags(user))
}
return tags;
}
fun getUserTags(user) {
return ["id-" + user.id, "loc-" + user.loc.id];
}

key | value | tag |
---|---|---|
L1 | [{id:1, slug: "foo"}, {id:2, slug: "bar"} ] |
loc-1, id-1, id-2 |
L2 | [] | loc-2 |
fun onUserUpdate(user) {
cache.deleteByTag("id-" + user.id)
cache.deleteByTag("loc-" + user.loc)
// id-1 will invalidate if the user changes location
// loc- will invalidate a user is being added or removed for that location
}

Results of
Cache Tagging
in GetYourGuide

Before (19 Sept 2024)
Lines of code: 4104
Coverage: ~77%
NOW
Lines of code: 3796
Coverage: ~83%
7% less code
(but we have more features)
cloc src/main
https://github.com/AlDanial/cloc
Code



Code
code to cache articles by tags (-90%)

Before (19 Sept 2024)
~$50-60/day
0,00000342 $/req
NOW
~$40-42/day
0,00000247 $/req
about -20%
Cost


Latency
Overall (-90%)



In the context of PHP

PHP
composer require symfony/cache
use Symfony\Contracts\Cache\ItemInterface;
$beta = 1.0;
$value = $cache->get('my_cache_key', function (ItemInterface $item): string {
// ... do some HTTP request or heavy computations
$computedValue = 'foo';
$item->expiresAfter(3600);
$item->tag(['tag_0', 'tag_1']);
return $computedValue;
}, $beta);
// cache invalidation
$cache->invalidateTags(['tag_1', 'tag_3']);
PHP
composer require symfony/cache
use Symfony\Contracts\Cache\ItemInterface;
$beta = 1.0;
$value = $cache->get('my_cache_key', function (ItemInterface $item): string {
// ... do some HTTP request or heavy computations
$computedValue = 'foo';
$item->expiresAfter(3600);
$item->tag(['tag_0', 'tag_1']);
return $computedValue;
}, $beta);
// cache invalidation
$cache->invalidateTags(['tag_1', 'tag_3']);
Probabilistic cache invalidation
Random
Cache refresh
Cache expire
Cache set (60s)
time
Cache expire
Random
Cache refresh
In the context of HTTP

HTTP
HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: max-age=604800
X-Cache-Tags: tag1,tag2
// ...body
GET /user/123 HTTP/1.1
PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: tag1, tag3
Thank you!

Cache Invalidation Solved
By Asmir Mustafic
Cache Invalidation Solved
Discover innovative solutions to cache invalidation challenges with Asmir Mustafic. Explore diverse caching variants, embedded resources, and effective cache tagging strategies that simplify complexity and enhance performance. Unlock the secrets to efficient content management!
- 79