Database Gotchas

Two key concepts you can't afford to ignore

 

 

 

David Seddon

david@seddonym.me

http://seddonym.me

  • Concept One: Atomicity

  • Concept Two: Concurrency

  • Caveats

This talk

Case Study: Shamtander

from django.db import models


class Account(models.Model):
    def has_at_least(amount):
        """Returns True if this account has a balance greater
        than or equal to the supplied amount.
        """
        balance = account.entries.aggregate(sum=models.Sum('amount'))['sum']
        return balance >= amount


class LedgerEntry(models.Model):
    created_at = models.DateTimeField(auto_now_add=True)
    account = models.ForeignKey(Account, related_name='entries')
    amount = models.IntegerField(help_text='The amount, in pence.')


def transfer(source, destination, amount):
    "Make a transfer between two Accounts of the supplied amount."
    if source.has_at_least(amount):
        LedgerEntry.objects.create(account=source, amount=(amount * -1))
        LedgerEntry.objects.create(account=destination, amount=amount)
    else:
        raise Exception('Insufficient funds.')
        

Shamtander's accounts system - v1

Concept One

Atomicity

Time

Book balance

Hang on a minute...

def transfer(source, destination, amount):
    "Make a transfer between two Accounts of the supplied amount."
    if source.has_at_least(amount):
        LedgerEntry.objects.create(account=source, amount=(amount * -1))
        LedgerEntry.objects.create(account=destination, amount=amount)
    else:
        raise Exception("Insufficient funds.")

What if this command fails?

Atomicity

 

noun: the state or fact of being composed of indivisible units.

Database transactions

Transactions are a way of wrapping queries up into discrete blocks

Query

Query

Query

Transaction

Will the first entry be saved?

def transfer_view(request, source_id, destination_id, amount):
    "API endpoint for instructing a transfer."
    source = Account.objects.get(id=source_id)
    destination = Account.objects.get(id=destination_id)
    try:
        transfer(source, destination, amount)
    except:
        # Failure
        return HttpResponse(status=400)
    else:
        # Success
        return HttpResponse(status=201)


def transfer(source, destination, amount):
    "Make a transfer between two Accounts of the supplied amount."
    if source.has_at_least(amount):
        LedgerEntry.objects.create(account=source, amount=(amount * -1))  # Succeeds
        LedgerEntry.objects.create(account=destination, amount=amount)  # Fails
    else:
        raise Exception("Insufficient funds.")

a. Yes

b. No

c. It depends

Making your requests atomic

# settings.py

ATOMIC_REQUESTS = True

The request/response cycle will be wrapped in a database transaction.

 

If an exception is raised, the
transaction is rolled back.

Time

Book balance

Shamtander's accounts system - v2

ATOMIC_REQUESTS = True

Shamtander scales!

Transfer moved into a celery task, rather than being called directly from a view.

transfer.delay(source_id, destination_id, amount)

Shamtander's accounts system - v3

transfer.delay(source_id, destination_id, amount)

Time

Book balance

transaction.atomic
from django.db import transaction


with transaction.atomic():
   foo()  # Will be rolled back
   bar()  # Will be rolled back
   raise Exception

Wrapping code in an atomic block guarantees atomicity.

Shamtander's accounts system - v4

def transfer(source_id, destination_id, amount):
    with transaction.atomic():
        ...

Time

Book balance

Any questions so far?

Concept Two

Concurrency

Users

Account balances

Shamtander scales again!

if source.has_at_least(amount):
    LedgerEntry.objects.create(account=source, amount=(amount * -1))
    LedgerEntry.objects.create(account=destination, amount=amount)

Concurrent processes

Worker 1

Worker 2

Source balance
£100





 
£0
 
-£50
 

Check balance >= £100

Check balance >= £50

Reduce balance by £100

Reduce balance by £50

Concurrent connections

table_one





 
table_two





 
table_three





 

Databases allow many processes at once to modify their data

Transaction

Transaction

Transaction

Transaction

Concurrency: reading

Transaction 1

Transaction 2

id value
1 0

(1)

?

SET value = 1

SELECT value

a. Transaction 2 reads value 0.

b. Transaction 2 reads value 1 straight away.

c. Transaction 2 reads value 1 once Transaction 1 commits.

d. It depends.

What happens?

Database isolation levels

SERIALIZABLE

 

REPEATABLE READ (MySQL default)

 

READ COMMITTED (PostgreSQL default)

 

READ UNCOMMITED

strict, accurate

permissive, fast

Read committed - reading

Records from other sessions will become visible as they are committed

Transaction 1

id value
1 0

0

1

SET value = 1

id value
1 1

Concurrency: reading

Transaction 1

Transaction 2

id value
1 0

(1)

?

SET value = 1

SELECT value

a. Transaction 2 reads value 0.

b. Transaction 2 reads value 1 straight away.

c. Transaction 2 reads value 1 once Transaction 1 commits.

d. It depends.

What happens?

Concurrency: reading

Transaction 1

Transaction 2

id value
1 0

0


1

SET value = 1

SELECT value

a. Transaction 2 reads value 0 straight away.

 

Isolation mode: READ COMMITTED

Concurrency: writing

Transaction 1 (writer)

Transaction 2 (writer)

id value
1 0

0

?

?
?
 

SET value = 1

SET value = 2

Isolation mode: READ COMMITTED

a. T1 sets value to 1, T2 then overwrites value as 2.

b. T2 commits value 2, then T1 commits value 1.

c. T2 waits until T1 commits, then sets value to 2.

d. T2 errors when it attempts to set value to 2.

What happens?

Read committed - writing

Records that sessions are writing to are marked as dirty immediately.

Other writers will wait until the lock is released.

Transaction 1

id value read only
1 0

0

1
NO

YES

NO

SET value = 1

id value
1 1

Concurrency: writing

Transaction 1 (writer)

Transaction 2 (writer)

id value
1 0

0

?

?
?
 

SET value = 1

SET value = 2

Isolation mode: READ COMMITTED

a. T1 sets value to 1, T2 then overwrites value as 2.

b. T2 commits value 2, then T1 commits value 1.

c. T2 waits until T1 commits, then sets value to 2.

d. T2 errors when it attempts to set value to 2.

What happens?

Concurrency: writing

Transaction 1 (writer)

Transaction 2 (writer)

id value read only
1 0

0




1

1

2
NO

YES




NO

YES

NO

SET value = 1

SET value = 2

 

 

Isolation mode: READ COMMITTED

c. T2 waits until T1 commits, then sets value to 2.

Ok, so how do we
prevent concurrency?

Preventing concurrency

Transaction 1 (writer)

Transaction 2 (writer)

id value read only
1 0
0


0

1


1
2
NO
YES


YES

NO


YES
NO

SET value = 1

SET value = 2

Isolation mode: READ COMMITTED

SELECT FOR UPDATE is a way of making
a read query behave like a write query.

SELECT FOR UPDATE

SELECT FOR UPDATE

 

 

 

Select for update in Django

MyModel.objects.select_for_update().get(id=1)

MyModel.objects.select_for_update().filter(id__in=[1, 3])

Select for update must be called within a transaction.

TransactionManagementError: select_for_update cannot be used
outside of a transaction.

@shared_task
def transfer(source_id, destination_id, amount):
    "Make a transfer between two Accounts of the supplied amount."
    source = Account.objects.get(id=source_id)
    destination = Account.objects.get(id=destination_id)
    with transaction.atomic():
        if source.has_at_least(amount):
            LedgerEntry.objects.create(account=source, amount=(amount * -1))
            LedgerEntry.objects.create(account=destination, amount=amount)

How can we prevent concurrent transfers on the same source account?


@shared_task
def transfer(source_id, destination_id, amount):
    "Make a transfer between two Accounts of the supplied amount."

    with transaction.atomic():

        # Wait for a lock on the source account
        source = Account.objects.select_for_update().get(id=source_id)
    
        destination = Account.objects.get(id=destination_id)
        
        if source.has_at_least(amount):
            LedgerEntry.objects.create(account=source, amount=(amount * -1))
            LedgerEntry.objects.create(account=destination, amount=amount)

Solution: pessimistic locking

Users

Account balances

Shamtander v5

def transfer(source_id, destination_id, amount):
    with transaction.atomic():
        source = Account.objects.select_for_update().get(id=source_id)
        ...

Four caveats

Caveat 1: tasks within transactions

There are two problems with this;
what are they?


def transfer(source, destination, amount):
    "Make a transfer between two Accounts of the supplied amount."
    with transaction.atomic():
        if source.has_at_least(amount):
            source_entry = LedgerEntry.objects.create(account=source,
                                                      amount=(amount * -1))
            send_notification_email.delay(source_entry.id)

            destination_entry = LedgerEntry.objects.create(account=destination,
                                                           amount=amount)
            send_notification_email.delay(destination_entry.id)

Caveat 1: tasks within transactions

  1. Email may get sent even if the transfer fails.

def transfer(source, destination, amount):
    "Make a transfer between two Accounts of the supplied amount."
    with transaction.atomic():
        if source.has_at_least(amount):
            source_entry = LedgerEntry.objects.create(account=source,
                                                      amount=(amount * -1))
            destination_entry = LedgerEntry.objects.create(account=destination,
                                                           amount=amount)
            send_notification_email.delay(source_entry.id,
                                          destination_entry.id)

Two problems

Caveat 1: tasks within transactions

  1. Email may get sent even if the transfer fails.
  2. Email may not be able to get the entries from the database yet.

def transfer(source, destination, amount):
    "Make a transfer between two Accounts of the supplied amount."
    with transaction.atomic():
        if source.has_at_least(amount):
            source_entry = LedgerEntry.objects.create(account=source,
                                                      amount=(amount * -1))
            destination_entry = LedgerEntry.objects.create(account=destination,
                                                           amount=amount)
            send_notification_email.delay(source_entry.id,
                                          destination_entry.id)

Two problems

Solution: performing actions on commit

with transaction.atomic():
   foo()
   transaction.on_commit(bar)
   baz()

bar() will be called only once the transaction has been committed successfully.

with transaction.atomic():
   foo()
   transaction.on_commit(lambda: some_celery_task.delay('arg1'))
   baz()

Use lambdas for celery tasks whenever you're in a transaction.

Caveat 2


@shared_task
def transfer(source_id, destination_id, amount):
    "Make a transfer between two Accounts of the supplied amount."

    with transaction.atomic():

        # Wait for a lock on the source and destination accounts
        Account.objects.select_for_update().filter(id__in=[source_id, destination_id])

        source = Account.objects.get(id=source_id)    
        destination = Account.objects.get(id=destination_id)
        
        if source.has_at_least(amount):
            LedgerEntry.objects.create(account=source, amount=(amount * -1))
            LedgerEntry.objects.create(account=destination, amount=amount)

Spot the bug

Caveat 2 - lazy select_for_update

Account.objects.select_for_update().filter(id__in=[source_id, destination_id])

Querysets are lazy!

In this case, the select_for_update will never be run.

bool(Account.objects.select_for_update().filter(id__in=[source_id, destination_id]))

Solution: wrap select_for_updates that use filter in a bool if you don't evaluate them straight away.

Caveat 3 - Deadlocks

ERROR: deadlock detected
  Detail:
     Process 13560 waits for ShareLock on transaction 3147316424; blocked by process 13566.
     Process 13566 waits for ShareLock on transaction 3147316408; blocked by process 13560.

What the...?

How a deadlock happens

ids = [1, 2]
bool(
  AccountHolder.objects\
    .select_for_update()\
    .filter(id__in=ids)
)

Process 1

Process 2

id read only
1 YES
2 YES
id read only
1 YES
2 YES

Waiting for each other

ids = [2, 1]
bool(
  AccountHolder.objects\
    .select_for_update()\
    .filter(id__in=ids)
)

Preventing deadlocks

ids = [1, 2]
bool(
  AccountHolder.objects\
    .select_for_update()\
    .filter(id__in=ids).
    .order_by('id')
)

Process 1

Process 2

ids = [2, 1]
bool(
  AccountHolder.objects\
    .select_for_update()\
    .filter(id__in=ids).
    .order_by('id')
)

Solution: when using select_for_updates on multiple records, make sure you acquire the locks in a consistent order.

Caveat 4 - Testing

 

This wraps your test in a transaction.  This can be a problem if you are testing something that doesn't run in a transaction in real life.

 

Example: a select_for_update included in a celery task that is not wrapped in a transaction, will pass tests but fail when it runs in production.

django.test.TestCase

 

Does not wrap your test in a transaction.  Slower, but better for code where you need to test behaviour relating to transactions.

django.test.TransactionTestCase

Summary

Atomicity

  • ATOMIC_REQUESTS makes your requests atomic.
  • Use transaction.atomic to enforce atomicity elsewhere.
  • Use transaction.on_commit() for tasks.

 

Concurrency

  • Can your code handle concurrency?  If not, prevent it!
  • Select for update is a simple way to make your code wait until it's safe.
  • Three caveats:

David Seddon   http://seddonym.me

  1. Querysets are lazy: make sure your select_for_update is being evaluated.
  2. Avoid deadlocks by acquiring locks in a consistent order.
  3. TestCase wraps your code in a transaction.

Appendix

Savepoints

Savepoints allow you to roll back
within transactions

Query

Transaction

Query

> Savepoint

Query

Query

> Rollback

Nested atomic blocks

Transaction

Query

> Savepoint

Query

Query

> Savepoint

Query

> Savepoint

Query

Query

with transaction.atomic():
    foo()
    
    with transaction.atomic():
       
        bar()
    





    with transaction.atomic():
       
        baz()


        with transaction.atomic():

            foobar()

Nested atomic blocks

with transaction.atomic():
    
    foo()  - Will be committed
    
    try:
        with transaction.atomic():
            bar()  - Will be rolled back
            raise Exception
    except:
        pass

    baz()  - Will be committed

Exceptions raised within an atomic block will roll back that atomic block.

Database Gotchas

By David Seddon

Database Gotchas

Two key concepts you can't afford to ignore

  • 2,235