Race conditions in

distributed systems

Atomic updates, optimistic locking, ...

Race conditions in

distributed systems

Atomic updates, optimistic locking, ...

Spot the problem

const basketItem = await basketItems.find({ guid });
basketItem.amount++;
await basketItems.replace({ guid }, basketItem);

const basketItem = await basketItems.find({ guid });
await basketItems.update({ guid }, { $set: { amount: item.amount + 1 } });

const basketItem = await basketItems.find({ itemId });
if (basketItem) {
  await basketItems.update({ guid: item.guid}, { amount: item.amount + 1 });
} else {
  await basketItems.insert({ guid: genGuid(), itemId, amount: 1 });
}

1

2

3

5

4

const user = users.find({ guid });
if (user && user.createdOn < new Date('2021-01-01')) {
  users.update({ guid }, { $set:{ legacy: true } });
}

const openPayments = payments.find({ completedOn: { $ne: null } });
payments.update(
  { guid: { $in: openPayments.map(x => x.guid) } },
  { $set:{ completedOn: now, wasForceClosed: true } }
);

Examples from our code base

Applying coupons and changing amount at the same time (fixed)
Adding items before opening payment request
Calculating Deutschland Card points for multiple items
Plu search in cart ui: One lookup could overtake the other (fixed)
Mobile app search: One lookup could overtake the other (fixed!?)
Store processing: Messages of different priorities are concurrent
...

Everything that can possibly go wrong will go wrong"

Murphy's law

\text{Chance for one particular case} = 1 : 1,000,000,000\\ \text{Count of similar cases} = 100\\ \text{Count of events} = 500,000 \times 20\\ \text{Expected bug count} = \frac{100 \times 500,000 \times 20}{1,000,000,000} = 1\\ \text{=> once per month}

Chance for an inconsistency to happen per month

\text{And for each 100,000 carts?}\\ + 20\\ \text{20 times, 40 times, 80 times, ... per month}

Locks

In local system: Lock, Semaphore, Mutex, Queue, ...
In distributed system: Distributed locks/sessions with Redis, Service Bus, ...

Pro:

Covers a lot of cases
May be implemented centrally => don't need to think about it most of the time

Con:

Costly, potentially slow
Can become a bottle neck
Deadlocks

Atomic updates

const basketItem = await basketItems.find({ guid });
basketItem.amount++;
await basketItems.replace({ guid }, basketItem);

const basketItem = await basketItems.find({ guid });
await basketItems.update({ guid }, { $set: { amount: item.amount + 1 } });

const basketItem = await basketItems.find({ itemId });
if (basketItem) {
  await basketItems.update({ guid: item.guid}, { amount: item.amount + 1 });
} else {
  await basketItems.insert({ guid: genGuid(), itemId, amount: 1 });
}

const user = users.find({ guid });
if (user && user.createdOn < new Date('2021-01-01')) {
  users.update({ guid }, { $set:{ legacy: true } });
}

const openPayments = payments.find({ completedOn: { $ne: null } });
payments.update(
  { guid: { $in: openPayments.map(x => x.guid) } },
  { $set:{ completedOn: now, wasForceClosed: true } }
);

await basketItems.update({ guid }, {  $inc: { amount: 1 } });

await basketItems.update({ guid }, {  $inc: { amount: 1 } });

await basketitems.update(
  { itemId },
  { $inc: { amount: 1 }, $setOnInsert: { guid: genGuid() } },
  { upsert: true }
)

payments.update(
  { completedOn: { $ne: null },
  { $set:{ wasForceClosed: true }, $currentDate: { completedOn: true } }
);

Pro: Fast, safe, simple

Con: Not always feasable

Optimistic locking

{_id: 1, version: 1, x: 1 }

database

{_id: 1, version: 1, x: 1 }

client 1: oldVersion = 1

{_id: 1, version: 1, x: 1 }

client 2: oldVersion = 1

{_id: 1, version: 1, x: 2 }

client 1: oldVersion = 1

{_id: 1, version: 1, x: 3 }

client 2: oldVersion = 1

{_id: 1, version: 2, x: 2 }

database

{_id: 1, version: 2, x: 2 }

client 2: oldVersion = 2

{_id: 1, version: 2, x: 4 }

client 2: oldVersion = 2

{_id: 1, version: 3, x: 4 }

database

Pro:

Usually next to no overhead (collision are rare)
Versatile and safe

Con: All clients must use this scheme

var updatedItem = optimisticLock(
  sessionCollection,
  Builders<SessionModel>.Filter.Eq(x => x.guid == model.sessionGuid),
  session => {
    var item = session.Items?.First(x => x.BasketItemGuid == model.BasketItemGuid);
    item.amount++;
    return session;
  }
)

What could it look like?

async Task optimisticLock<T>(
  IMongoCollection<T> coll,
  FilterDefinition<T> filter,
  Func<T, Task<T>> fn
) where T : Versioned
{
  for (var attempt = 0; attempt < 100; attempt++) {
    var item = await coll.Find(filter).Limit(1).FirstAsync();
    var prevVersion = item.version;
    
    var updated = await fn(item);
    updated.version = prevVersion + 1;
    
    var result = await coll.ReplaceOneAsync(
      Builders<T>.Filter.And(new FilterDefinition<T>[] {
        Builders<T>.Filter.Eq(x => x.Id, item.Id),
        Builders<T>.Filter.Eq(x => x.version, prevVersion)
      }),
      updated
    );
    if (result.ModifiedCount > 0) return;
  }
  throw new Exception();
}

Thank you for your attention

Backup slides

Race conditions in distributed systems: Atomic updates, optimistic locking, ...

By Marco Schumacher

Race conditions in distributed systems: Atomic updates, optimistic locking, ...

Race conditions in

distributed systems

Race conditions in

distributed systems

Spot the problem

1

2

3

5

4

Examples from our code base

Examples from our code base

Examples from our code base

Examples from our code base

Examples from our code base

Murphy's law

Chance for an inconsistency to happen per month

Locks

Pro:

Con:

Atomic updates

Pro: Fast, safe, simple

Con: Not always feasable

Optimistic locking

What could it look like?

Thank you for your attention

Backup slides

Race conditions in distributed systems: Atomic updates, optimistic locking, ...

More from Marco Schumacher