Cache Invalidation,
Solved

Asmir Mustafic

GetYourGuide

 

 

GetYourGuide

Me

Asmir Mustafic

Caching

There are only two hard things in Computer Science: cache invalidation and naming things.

Cache

Contento

Headless CMS

webhook

get

get / set

get

legal, email templates, help center and much more

A few info

  • Java service
  • ~200 req/s (spikes of 1k req/s)
  • TTL set to 1 week
  • Latency issues when an entry expires due to TTL
  • Complex code to invalidate the cache on webhooks
  • Don't want to increase TTL due to costs
    • Everything is stored in memory

Cache Invalidation

Why is hard?

key value
1 {id:1, slug: "foo", loc: 1}
2 {id:2, slug: "bar", loc: 1}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}
key value
1 {id:1, slug: "foo", loc: 1}
2 {id:2, slug: "bar", loc: 1}
fun onUserUpdate(user) {
   cache.delete(user.id)
}

Variants

key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}

@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
    user = cache.get(mail)
    
    if (user == null) {
    	user = repository.getUserByMail(mail);
        cache.put(mail, user)
    }
    
    return user;
}
fun onUserUpdate(user) {
   cache.delete(user.id)
   cache.delete(user.mail)
}
key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}

Multiple Variants

key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}
FoO@a {id:1, mail: "foo@a", loc: 1}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}

@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
    user = cache.get(mail)
    
    if (user == null) {
    	user = repository.getUserByMail(mail);
        cache.put(mail, user)
    }
    
    return user;
}

// repository code

fun getUserByMail(mail) {
    mail = mail.lower();
    //... some sql query
    return user;
}
fun onUserUpdate(user) {
   cache.delete(user.id)
   cache.delete(user.mail)
   
   
   // how to invalidate all the permutations of 
   // the mail ?
}
key value
1 {id:1, mail: "foo@a", loc: 1}
2 {id:2, mail: "bar@a", loc: 1}
foo@a {id:1, mail: "foo@a", loc: 1}
FoO@a {id:1, mail: "foo@a", loc: 1}

Embedded resources

key value
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}}
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user)
    }
    
    return user;
}

If we rename "Rome" to "Roma"
we need to invalidate all the users from Rome

fun onLocationUpdate(location) {
   
   // find all users with that location and invalidate
   users = usersApi.findAllWithLocationId(location.id)
   for (user in users) {
     cache.delete(user.id)
     cache.delete(user.mail)
     cache.delete("L" + user.loc)
     // more to come as the product grows
   }
   
   // find all trups with that location and invalidate
   // find all the bookigns with that location and invalidate
   // ....more
  
   
}
key value
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}}

Lists

key value
L1 [{id:1, slug: "foo", loc: 1},
{id:2, slug: "bar", loc: 1} ]
L2 []
@Get("/users-by-location/{id}")
fun getUsersByLocation(id) {
    users = cache.get("L" + id)
    
    if (user == null) {
    	users = repository.getUsersByLocation(id)
        cache.put("L" + id, users)
    }
    
    return users;
}
fun onUserUpdate(user) {
   cache.delete(user.id)
   cache.delete(user.mail)
    
   
   // works only when user is added to collection
   cache.delete("L" + user.loc)
   
   // what is user changes location ? (we need the previous user state...)
   
}
key value
L1 [{id:1, slug: "foo", loc: 1},
{id:2, slug: "bar", loc: 1} ]
L2 []

Things are getting complicated...

Cache tags to resque

Cache Tagging

HTTP Cache Tags...

Common in CDNs

key value
1 ...
2 ...
foo ...
FoO ...
key value
1 ...
2 ...
foo ...
FoO ...
key tag
1 id-1
key tag
2 id-2
...
key tag
foo id-1
key tag
FoO id-1

In practice

@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    if (user == null) {
    	user = repository.getUserById(id)
        cache.put(id, user, getUserTags(user))
    }
    return user;
}

fun getUserTags(user) {
   return ["id-" + user.id]; 
}
key value tags
1 ... id-1
2 ... id-2
fun onUserUpdate(user) {
   cache.deleteByTag("id-" + user.id)
}
key value tags
1 ... id-1
2 ... id-2

Variants
and
Multiple variants

@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    if (user == null) {
    	user = repository.getUserById(id)
        cache.put(id, user, getUserTags(user))
    }
    return user;
}

@Get("/user-by-mail/{mail}")
fun getUserByMail(mail) {
    user = cache.get(mail)
    if (user == null) {
    	user = repository.getUserByMail(mail)
        cache.put(mail, user, getUserTags(user))
    }
    return user;
}

fun getUserTags(user) {
   return ["id-" + user.id]; 
}
key value tags
1 ... id-1
2 ... id-2
foo@a ... id-1
FoO@a ... id-1
fun onUserUpdate(user) {
   cache.deleteByTag("id-" + user.id)
}
key value tags
1 ... id-1
2 ... id-2
foo ... id-1
FoO ... id-1

Embedded resources

key value tags
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} id-1, loc-1
2 {id:2, slug: "bar", loc: {id: 1, name : "Rome"}} id-2, loc-1
@Get("/user/{id}")
fun getUserById(id) {
    user = cache.get(id)
    
    if (user == null) {
    	user = repository.getUserById(id);
        cache.put(id, user, getUserTags(user))
    }
    
    return user;
}


fun getUserTags(user) {
   return ["id-" + user.id, "loc-" + user.loc.id]; 
}
fun onLocationUpdate(location) {
   cache.deleteByTag("loc-" + location.id)
}
key value tags
1 {id:1, slug: "foo", loc: {id: 1, name : "Rome"}} id-1, loc-1
2 {id:2, slug: "bar", loc: {id: 1, name : "Rome"}} id-2, loc-1

Lists

key value tag
L1 [{id:1, slug: "foo"},
{id:2, slug: "bar"} ]
loc-1, id-1, id-2
L2 [] loc-2
@Get("/users-by-location/{id}")
fun getUsersByLocation(id) {
    users = cache.get("L" + id)
    
    if (user == null) {
    	users = repository.getUsersByLocation(id)    
        cache.put("L" + id, users, ["loc-" + id] + getUsersTags(users));
    }
    
    return users;
}

fun getUsersTags(users) {
   tags = [];
   for (user in users) {
       tags.concat(getUserTags(user))
   }
   return tags;
}

fun getUserTags(user) {
   return ["id-" + user.id, "loc-" + user.loc.id]; 
}
key value tag
L1 [{id:1, slug: "foo"},
{id:2, slug: "bar"} ]
loc-1, id-1, id-2
L2 [] loc-2
fun onUserUpdate(user) {
   cache.deleteByTag("id-" + user.id)
   cache.deleteByTag("loc-" + user.loc)
   

   // id-1 will invalidate if the user changes location
   // loc- will invalidate a user is being added or removed for that location
}

Results of
Cache Tagging
in GetYourGuide

Before (19 Sept 2024)

Lines of code: 4104
Coverage: ~77%

NOW

Lines of code: 3796
Coverage: ~83%

7% less code
(but we have more features)

cloc src/main
https://github.com/AlDanial/cloc

Code

Code

code to cache articles by tags (-90%)

Before (19 Sept 2024)

~$50-60/day

0,00000342 $/req

NOW

~$40-42/day

0,00000247 $/req

about -20%

Cost

Latency

Overall (-90%)

In the context of PHP

PHP

composer require symfony/cache
use Symfony\Contracts\Cache\ItemInterface;

$beta = 1.0;
$value = $cache->get('my_cache_key', function (ItemInterface $item): string {

    // ... do some HTTP request or heavy computations
    $computedValue = 'foo';
    
     $item->expiresAfter(3600);
    
    $item->tag(['tag_0', 'tag_1']);

    return $computedValue;
}, $beta);


// cache invalidation 
$cache->invalidateTags(['tag_1', 'tag_3']);

PHP

composer require symfony/cache
use Symfony\Contracts\Cache\ItemInterface;

$beta = 1.0;
$value = $cache->get('my_cache_key', function (ItemInterface $item): string {

    // ... do some HTTP request or heavy computations
    $computedValue = 'foo';
    
     $item->expiresAfter(3600);
    
    $item->tag(['tag_0', 'tag_1']);

    return $computedValue;
}, $beta);


// cache invalidation 
$cache->invalidateTags(['tag_1', 'tag_3']);

Probabilistic cache invalidation

Random
Cache refresh

Cache  expire

Cache set (60s)

time

Cache  expire

Random
Cache refresh

In the context of HTTP

HTTP

HTTP/1.1 200 OK
Content-Type: application/json
Cache-Control: max-age=604800
X-Cache-Tags: tag1,tag2

// ...body
GET /user/123 HTTP/1.1
PURGE / HTTP/1.1
Host: example.com
X-Xkey-Purge: tag1, tag3

Thank you!

Made with Slides.com