Database Gotchas
Two key concepts you can't afford to ignore
David Seddon
david@seddonym.me
http://seddonym.me
-
Concept One: Atomicity
-
Concept Two: Concurrency
-
Caveats
This talk
Case Study: Shamtander
from django.db import models
class Account(models.Model):
def has_at_least(amount):
"""Returns True if this account has a balance greater
than or equal to the supplied amount.
"""
balance = account.entries.aggregate(sum=models.Sum('amount'))['sum']
return balance >= amount
class LedgerEntry(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
account = models.ForeignKey(Account, related_name='entries')
amount = models.IntegerField(help_text='The amount, in pence.')
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
else:
raise Exception('Insufficient funds.')
Shamtander's accounts system - v1
Concept One
Atomicity
Time
Book balance
Hang on a minute...
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
else:
raise Exception("Insufficient funds.")
What if this command fails?
Atomicity
noun: the state or fact of being composed of indivisible units.
Database transactions
Transactions are a way of wrapping queries up into discrete blocks
Query
Query
Query
Transaction
Will the first entry be saved?
def transfer_view(request, source_id, destination_id, amount):
"API endpoint for instructing a transfer."
source = Account.objects.get(id=source_id)
destination = Account.objects.get(id=destination_id)
try:
transfer(source, destination, amount)
except:
# Failure
return HttpResponse(status=400)
else:
# Success
return HttpResponse(status=201)
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1)) # Succeeds
LedgerEntry.objects.create(account=destination, amount=amount) # Fails
else:
raise Exception("Insufficient funds.")
a. Yes
b. No
c. It depends
Making your requests atomic
# settings.py
ATOMIC_REQUESTS = True
The request/response cycle will be wrapped in a database transaction.
If an exception is raised, the
transaction is rolled back.
Time
Book balance
Shamtander's accounts system - v2
ATOMIC_REQUESTS = True
Shamtander scales!
Transfer moved into a celery task, rather than being called directly from a view.
transfer.delay(source_id, destination_id, amount)
Shamtander's accounts system - v3
transfer.delay(source_id, destination_id, amount)
Time
Book balance
transaction.atomic
from django.db import transaction
with transaction.atomic():
foo() # Will be rolled back
bar() # Will be rolled back
raise Exception
Wrapping code in an atomic block guarantees atomicity.
Shamtander's accounts system - v4
def transfer(source_id, destination_id, amount):
with transaction.atomic():
...
Time
Book balance
Any questions so far?
Concept Two
Concurrency
Users
Account balances
Shamtander scales again!
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
Concurrent processes
Worker 1
Worker 2
Source balance |
---|
£100 |
£0 |
-£50 |
Check balance >= £100
Check balance >= £50
Reduce balance by £100
Reduce balance by £50
Concurrent connections
table_one |
---|
|
table_two |
---|
|
table_three |
---|
|
Databases allow many processes at once to modify their data
Transaction
Transaction
Transaction
Transaction
Concurrency: reading
Transaction 1
Transaction 2
id | value |
---|---|
1 | 0 (1) ? |
SET value = 1
SELECT value
a. Transaction 2 reads value 0.
b. Transaction 2 reads value 1 straight away.
c. Transaction 2 reads value 1 once Transaction 1 commits.
d. It depends.
What happens?
Database isolation levels
SERIALIZABLE
REPEATABLE READ (MySQL default)
READ COMMITTED (PostgreSQL default)
READ UNCOMMITED
strict, accurate
permissive, fast
Read committed - reading
Records from other sessions will become visible as they are committed
Transaction 1
id | value |
---|---|
1 | 0 0 1 |
SET value = 1
id | value |
---|---|
1 | 1 |
Concurrency: reading
Transaction 1
Transaction 2
id | value |
---|---|
1 | 0 (1) ? |
SET value = 1
SELECT value
a. Transaction 2 reads value 0.
b. Transaction 2 reads value 1 straight away.
c. Transaction 2 reads value 1 once Transaction 1 commits.
d. It depends.
What happens?
Concurrency: reading
Transaction 1
Transaction 2
id | value |
---|---|
1 | 0 0 1 |
SET value = 1
SELECT value
a. Transaction 2 reads value 0 straight away.
Isolation mode: READ COMMITTED
Concurrency: writing
Transaction 1 (writer)
Transaction 2 (writer)
id | value |
---|---|
1 | 0 0 ? ? ? |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
a. T1 sets value to 1, T2 then overwrites value as 2.
b. T2 commits value 2, then T1 commits value 1.
c. T2 waits until T1 commits, then sets value to 2.
d. T2 errors when it attempts to set value to 2.
What happens?
Read committed - writing
Records that sessions are writing to are marked as dirty immediately.
Other writers will wait until the lock is released.
Transaction 1
id | value | read only |
---|---|---|
1 | 0 0 1 |
NO YES NO |
SET value = 1
id | value |
---|---|
1 | 1 |
Concurrency: writing
Transaction 1 (writer)
Transaction 2 (writer)
id | value |
---|---|
1 | 0 0 ? ? ? |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
a. T1 sets value to 1, T2 then overwrites value as 2.
b. T2 commits value 2, then T1 commits value 1.
c. T2 waits until T1 commits, then sets value to 2.
d. T2 errors when it attempts to set value to 2.
What happens?
Concurrency: writing
Transaction 1 (writer)
Transaction 2 (writer)
id | value | read only |
---|---|---|
1 | 0 0 1 1 2 |
NO YES NO YES NO |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
c. T2 waits until T1 commits, then sets value to 2.
Ok, so how do we
prevent concurrency?
Preventing concurrency
Transaction 1 (writer)
Transaction 2 (writer)
id | value | read only |
---|---|---|
1 | 0 0 0 1 1 2 |
NO YES YES NO YES NO |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
SELECT FOR UPDATE is a way of making
a read query behave like a write query.
SELECT FOR UPDATE
SELECT FOR UPDATE
Select for update in Django
MyModel.objects.select_for_update().get(id=1)
MyModel.objects.select_for_update().filter(id__in=[1, 3])
Select for update must be called within a transaction.
TransactionManagementError: select_for_update cannot be used
outside of a transaction.
@shared_task
def transfer(source_id, destination_id, amount):
"Make a transfer between two Accounts of the supplied amount."
source = Account.objects.get(id=source_id)
destination = Account.objects.get(id=destination_id)
with transaction.atomic():
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
How can we prevent concurrent transfers on the same source account?
@shared_task
def transfer(source_id, destination_id, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
# Wait for a lock on the source account
source = Account.objects.select_for_update().get(id=source_id)
destination = Account.objects.get(id=destination_id)
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
Solution: pessimistic locking
Users
Account balances
Shamtander v5
def transfer(source_id, destination_id, amount):
with transaction.atomic():
source = Account.objects.select_for_update().get(id=source_id)
...
Four caveats
Caveat 1: tasks within transactions
There are two problems with this;
what are they?
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
if source.has_at_least(amount):
source_entry = LedgerEntry.objects.create(account=source,
amount=(amount * -1))
send_notification_email.delay(source_entry.id)
destination_entry = LedgerEntry.objects.create(account=destination,
amount=amount)
send_notification_email.delay(destination_entry.id)
Caveat 1: tasks within transactions
- Email may get sent even if the transfer fails.
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
if source.has_at_least(amount):
source_entry = LedgerEntry.objects.create(account=source,
amount=(amount * -1))
destination_entry = LedgerEntry.objects.create(account=destination,
amount=amount)
send_notification_email.delay(source_entry.id,
destination_entry.id)
Two problems
Caveat 1: tasks within transactions
- Email may get sent even if the transfer fails.
- Email may not be able to get the entries from the database yet.
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
if source.has_at_least(amount):
source_entry = LedgerEntry.objects.create(account=source,
amount=(amount * -1))
destination_entry = LedgerEntry.objects.create(account=destination,
amount=amount)
send_notification_email.delay(source_entry.id,
destination_entry.id)
Two problems
Solution: performing actions on commit
with transaction.atomic():
foo()
transaction.on_commit(bar)
baz()
bar() will be called only once the transaction has been committed successfully.
with transaction.atomic():
foo()
transaction.on_commit(lambda: some_celery_task.delay('arg1'))
baz()
Use lambdas for celery tasks whenever you're in a transaction.
Caveat 2
@shared_task
def transfer(source_id, destination_id, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
# Wait for a lock on the source and destination accounts
Account.objects.select_for_update().filter(id__in=[source_id, destination_id])
source = Account.objects.get(id=source_id)
destination = Account.objects.get(id=destination_id)
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
Spot the bug
Caveat 2 - lazy select_for_update
Account.objects.select_for_update().filter(id__in=[source_id, destination_id])
Querysets are lazy!
In this case, the select_for_update will never be run.
bool(Account.objects.select_for_update().filter(id__in=[source_id, destination_id]))
Solution: wrap select_for_updates that use filter in a bool if you don't evaluate them straight away.
Caveat 3 - Deadlocks
ERROR: deadlock detected
Detail:
Process 13560 waits for ShareLock on transaction 3147316424; blocked by process 13566.
Process 13566 waits for ShareLock on transaction 3147316408; blocked by process 13560.
What the...?
How a deadlock happens
ids = [1, 2]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids)
)
Process 1
Process 2
id | read only |
---|---|
1 | YES |
2 | YES |
id | read only |
---|---|
1 | YES |
2 | YES |
Waiting for each other
ids = [2, 1]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids)
)
Preventing deadlocks
ids = [1, 2]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids).
.order_by('id')
)
Process 1
Process 2
ids = [2, 1]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids).
.order_by('id')
)
Solution: when using select_for_updates on multiple records, make sure you acquire the locks in a consistent order.
Caveat 4 - Testing
This wraps your test in a transaction. This can be a problem if you are testing something that doesn't run in a transaction in real life.
Example: a select_for_update included in a celery task that is not wrapped in a transaction, will pass tests but fail when it runs in production.
django.test.TestCase
Does not wrap your test in a transaction. Slower, but better for code where you need to test behaviour relating to transactions.
django.test.TransactionTestCase
Summary
Atomicity
- ATOMIC_REQUESTS makes your requests atomic.
- Use transaction.atomic to enforce atomicity elsewhere.
- Use transaction.on_commit() for tasks.
Concurrency
- Can your code handle concurrency? If not, prevent it!
- Select for update is a simple way to make your code wait until it's safe.
- Three caveats:
David Seddon http://seddonym.me
- Querysets are lazy: make sure your select_for_update is being evaluated.
- Avoid deadlocks by acquiring locks in a consistent order.
- TestCase wraps your code in a transaction.
Appendix
Savepoints
Savepoints allow you to roll back
within transactions
Query
Transaction
Query
> Savepoint
Query
Query
> Rollback
Nested atomic blocks
Transaction
Query
> Savepoint
Query
Query
> Savepoint
Query
> Savepoint
Query
Query
with transaction.atomic():
foo()
with transaction.atomic():
bar()
with transaction.atomic():
baz()
with transaction.atomic():
foobar()
Nested atomic blocks
with transaction.atomic():
foo() - Will be committed
try:
with transaction.atomic():
bar() - Will be rolled back
raise Exception
except:
pass
baz() - Will be committed
Exceptions raised within an atomic block will roll back that atomic block.
Database Gotchas
By David Seddon
Database Gotchas
Two key concepts you can't afford to ignore
- 2,455