Cache Invalidation,
Solved

Asmir Mustafic

GetYourGuide

 

 

GetYourGuide

Me

Asmir Mustafic

Caching

There are only two hard things in Computer Science: cache invalidation and naming things.

Cache

Contento

Headless CMS

webhook

get

get / set

get

legal, email templates, help center and much more

A few info

  • Java service
  • ~200 req/s (spikes of 1k req/s)
  • TTL set to 1 week
  • Latency issues when an entry expires due to TTL
  • Complex code to invalidate the cache on webhooks
  • Don't want to increase TTL due to costs
    • Everything is stored in memory

Cache Invalidation

Why is hard?

key value
1 {id:1, slug: "foo", loc: 1}
2 {id:2, slug: "bar", loc: 1}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}
key value
1 {id:1, slug: "foo", loc: 1}
2 {id:2, slug: "bar", loc: 1}
fun onUserUpdate(user) {
   cache.delete(user.id)
}

Variants

key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}

@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
    user = cache.get(mail)
    
    if (user == null) {
    	user = repository.getUserByMail(mail);
        cache.put(mail, user)
    }
    
    return user;
}
fun onUserUpdate(user) {
   cache.delete(user.id)
   cache.delete(user.mail)
}
key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}

Multiple Variants

key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}
FoO@a {id:1, mail: "foo@a", loc: 1}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}

@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
    user = cache.get(mail)
    
    if (user == null) {
    	user = repository.getUserByMail(mail);
        cache.put(mail, user)
    }
    
    return user;
}

// repository code

fun getUserByMail(mail) {
    mail = mail.lower();
    //... some sql query
    return user;
}
fun onUserUpdate(user) {
   cache.delete(user.id)
   cache.delete(user.mail)
   
   
   // how to invalidate all the permutations of 
   // the mail ?
}
key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}
FoO@a {id:1, mail: "foo@a", loc: 1}

Embedded resources

key value
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}

If we rename "Rome" to "Roma"
we need to invalidate all the users from Rome

fun onLocationUpdate(location) {
   
   // find all users with that location and invalidate
   users = usersApi.findAllWithLocationId(location.id)
   for (user in users) {
     cache.delete(user.id)
     cache.delete(user.mail)
     cache.delete("L" + user.loc)
     // more to come as the product grows
   }
   
   // find all trups with that location and invalidate
   // find all the bookigns with that location and invalidate
   // ....more
  
   
}
key value
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}}

Lists

key value
L1 [{id:1, slug: "foo", loc: 1},
{id:2, slug: "bar", loc: 1} ]
L2 []
@Get("/users-by-location/{id}")
fun getUsersByLocation(id) {
    users = cache.get("L" + id)
    
    if (user == null) {
    	users = repository.getUsersByLocation(id)
        cache.put("L" + id, users)
    }
    
    return users;
}
fun onUserUpdate(user) {
   cache.delete(user.id)
   cache.delete(user.mail)
    
   
   // works only when user is added to collection
   cache.delete("L" + user.loc)
   
   // what is user changes location ? (we need the previous user state...)
   
}
key value
L1 [{id:1, slug: "foo", loc: 1},
{id:2, slug: "bar", loc: 1} ]
L2 []

Things are getting complicated...

Cache tags to resque

Cache Tagging

HTTP Cache Tags...

Common in CDNs

key value
1 ...
2 ...
foo ...
FoO ...
key value
1 ...
2 ...
foo ...
FoO ...
key tag
1 id-1
key tag
2 id-2
...
key tag
foo id-1
key tag
FoO id-1

In practice

@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    if (user == null) {
    	user = repository.getUserById(id)
        cache.put(id, user, getUserTags(user))
    }
    return user;
}

fun getUserTags(user) {
   return ["id-" + user.id]; 
}
key value tags
1 ... id-1
2 ... id-2
fun onUserUpdate(user) {
   cache.deleteByTag("id-" + user.id)
}
key value tags
1 ... id-1
2 ... id-2

Variants
and
Multiple variants

@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    if (user == null) {
    	user = repository.getUserById(id)
        cache.put(id, user, getUserTags(user))
    }
    return user;
}

@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
    user = cache.get(mail)
    if (user == null) {
    	user = repository.getUserByMail(mail)
        cache.put(mail, user, getUserTags(user))
    }
    return user;
}

fun getUserTags(user) {
   return ["id-" + user.id]; 
}
key value tags
1 ... id-1
2 ... id-2
foo@a ... id-1
FoO@a ... id-1
fun onUserUpdate(user) {
   cache.deleteByTag("id-" + user.id)
}
key value tags
1 ... id-1
2 ... id-2
foo ... id-1
FoO ... id-1

Embedded resources

key value tags
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} id-1, loc-1
2 {id:2, slug: "bar", loc: {id: 1, name : "Rome"}} id-2, loc-1
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user, getUserTags(user))
    }
    
    return user;
}


fun getUserTags(user) {
   return ["id-" + user.id, "loc-" + user.loc.id]; 
}
fun onLocationUpdate(location) {
   cache.deleteByTag("loc-" + location.id)
}
key value tags
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} id-1, loc-1
2 {id:2, slug: "bar", loc: {id: 1, name : "Rome"}} id-2, loc-1

Lists

key value tag
L1 [{id:1, slug: "foo"},
{id:2, slug: "bar"} ]
loc-1, id-1, id-2
L2 [] loc-2
@Get("/users-by-location/{id}")
fun getUsersByLocation(id) {
    users = cache.get("L" + id)
    
    if (user == null) {
    	users = repository.getUsersByLocation(id)    
        cache.put("L" + id, users, ["loc-" + id] + getUsersTags(users));
    }
    
    return users;
}

fun getUsersTags(users) {
   tags = [];
   for (user in users) {
       tags.concat(getUserTags(user))
   }
   return tags;
}

fun getUserTags(user) {
   return ["id-" + user.id, "loc-" + user.loc.id]; 
}
key value tag
L1 [{id:1, slug: "foo"},
{id:2, slug: "bar"} ]
loc-1, id-1, id-2
L2 [] loc-2
fun onUserUpdate(user) {
   cache.deleteByTag("id-" + user.id)
   cache.deleteByTag("loc-" + user.loc)
   

   // id-1 will invalidate if the user changes location
   // loc- will invalidate a user is being added or removed for that location
}

Results of
Cache Tagging
in GetYourGuide

Before (19 Sept 2024)

Lines of code: 4104
Coverage: ~77%

NOW

Lines of code: 3796
Coverage: ~83%

7% less code
(but we have more features)

cloc src/main
https://github.com/AlDanial/cloc

Code

Code

code to cache articles by tags (-90%)

Before (19 Sept 2024)

~$50-60/day

0,00000342 $/req

NOW

~$40-42/day

0,00000247 $/req

about -20%

Cost

Latency

Overall (-90%)

In the context of PHP

PHP

composer require symfony/cache
use Symfony\Contracts\Cache\ItemInterface;

$beta = 1.0;
$value = $cache->get('my_cache_key', function (ItemInterface $item): string {

    // ... do some HTTP request or heavy computations
    $computedValue = 'foo';
    
     $item->expiresAfter(3600);
    
    $item->tag(['tag_0', 'tag_1']);

    return $computedValue;
}, $beta);


// cache invalidation 
$cache->invalidateTags(['tag_1', 'tag_3']);

PHP

composer require symfony/cache
use Symfony\Contracts\Cache\ItemInterface;

$beta = 1.0;
$value = $cache->get('my_cache_key', function (ItemInterface $item): string {

    // ... do some HTTP request or heavy computations
    $computedValue = 'foo';
    
     $item->expiresAfter(3600);
    
    $item->tag(['tag_0', 'tag_1']);

    return $computedValue;
}, $beta);


// cache invalidation 
$cache->invalidateTags(['tag_1', 'tag_3']);

Probabilistic cache invalidation

Random
Cache refresh

Cache  expire

Cache set (60s)

time

Cache  expire

Random
Cache refresh

In the context of HTTP

HTTP

HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: max-age=604800
X-Cache-Tags: tag1,tag2

// ...body
GET /user/123 HTTP/1.1
PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: tag1, tag3

Thank you!

Cache Invalidation Solved

By Asmir Mustafic

Cache Invalidation Solved

Discover innovative solutions to cache invalidation challenges with Asmir Mustafic. Explore diverse caching variants, embedded resources, and effective cache tagging strategies that simplify complexity and enhance performance. Unlock the secrets to efficient content management!

  • 79