Two key concepts you can't afford to ignore
David Seddon
david@seddonym.me
http://seddonym.me
This talk
from django.db import models
class Account(models.Model):
def has_at_least(amount):
"""Returns True if this account has a balance greater
than or equal to the supplied amount.
"""
balance = account.entries.aggregate(sum=models.Sum('amount'))['sum']
return balance >= amount
class LedgerEntry(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
account = models.ForeignKey(Account, related_name='entries')
amount = models.IntegerField(help_text='The amount, in pence.')
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
else:
raise Exception('Insufficient funds.')
Shamtander's accounts system - v1
Time
Book balance
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
else:
raise Exception("Insufficient funds.")
What if this command fails?
noun: the state or fact of being composed of indivisible units.
Transactions are a way of wrapping queries up into discrete blocks
Query
Query
Query
Transaction
def transfer_view(request, source_id, destination_id, amount):
"API endpoint for instructing a transfer."
source = Account.objects.get(id=source_id)
destination = Account.objects.get(id=destination_id)
try:
transfer(source, destination, amount)
except:
# Failure
return HttpResponse(status=400)
else:
# Success
return HttpResponse(status=201)
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1)) # Succeeds
LedgerEntry.objects.create(account=destination, amount=amount) # Fails
else:
raise Exception("Insufficient funds.")
a. Yes
b. No
c. It depends
# settings.py
ATOMIC_REQUESTS = True
The request/response cycle will be wrapped in a database transaction.
If an exception is raised, the
transaction is rolled back.
Time
Book balance
Shamtander's accounts system - v2
ATOMIC_REQUESTS = True
Shamtander scales!
Transfer moved into a celery task, rather than being called directly from a view.
transfer.delay(source_id, destination_id, amount)
Shamtander's accounts system - v3
transfer.delay(source_id, destination_id, amount)
Time
Book balance
transaction.atomic
from django.db import transaction
with transaction.atomic():
foo() # Will be rolled back
bar() # Will be rolled back
raise Exception
Wrapping code in an atomic block guarantees atomicity.
Shamtander's accounts system - v4
def transfer(source_id, destination_id, amount):
with transaction.atomic():
...
Time
Book balance
Users
Account balances
Shamtander scales again!
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
Concurrent processes
Worker 1
Worker 2
Source balance |
---|
£100 |
£0 |
-£50 |
Check balance >= £100
Check balance >= £50
Reduce balance by £100
Reduce balance by £50
Concurrent connections
table_one |
---|
|
table_two |
---|
|
table_three |
---|
|
Databases allow many processes at once to modify their data
Transaction
Transaction
Transaction
Transaction
Concurrency: reading
Transaction 1
Transaction 2
id | value |
---|---|
1 | 0 (1) ? |
SET value = 1
SELECT value
a. Transaction 2 reads value 0.
b. Transaction 2 reads value 1 straight away.
c. Transaction 2 reads value 1 once Transaction 1 commits.
d. It depends.
What happens?
Database isolation levels
SERIALIZABLE
REPEATABLE READ (MySQL default)
READ COMMITTED (PostgreSQL default)
READ UNCOMMITED
strict, accurate
permissive, fast
Read committed - reading
Records from other sessions will become visible as they are committed
Transaction 1
id | value |
---|---|
1 | 0 0 1 |
SET value = 1
id | value |
---|---|
1 | 1 |
Concurrency: reading
Transaction 1
Transaction 2
id | value |
---|---|
1 | 0 (1) ? |
SET value = 1
SELECT value
a. Transaction 2 reads value 0.
b. Transaction 2 reads value 1 straight away.
c. Transaction 2 reads value 1 once Transaction 1 commits.
d. It depends.
What happens?
Concurrency: reading
Transaction 1
Transaction 2
id | value |
---|---|
1 | 0 0 1 |
SET value = 1
SELECT value
a. Transaction 2 reads value 0 straight away.
Isolation mode: READ COMMITTED
Concurrency: writing
Transaction 1 (writer)
Transaction 2 (writer)
id | value |
---|---|
1 | 0 0 ? ? ? |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
a. T1 sets value to 1, T2 then overwrites value as 2.
b. T2 commits value 2, then T1 commits value 1.
c. T2 waits until T1 commits, then sets value to 2.
d. T2 errors when it attempts to set value to 2.
What happens?
Read committed - writing
Records that sessions are writing to are marked as dirty immediately.
Other writers will wait until the lock is released.
Transaction 1
id | value | read only |
---|---|---|
1 | 0 0 1 |
NO YES NO |
SET value = 1
id | value |
---|---|
1 | 1 |
Concurrency: writing
Transaction 1 (writer)
Transaction 2 (writer)
id | value |
---|---|
1 | 0 0 ? ? ? |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
a. T1 sets value to 1, T2 then overwrites value as 2.
b. T2 commits value 2, then T1 commits value 1.
c. T2 waits until T1 commits, then sets value to 2.
d. T2 errors when it attempts to set value to 2.
What happens?
Concurrency: writing
Transaction 1 (writer)
Transaction 2 (writer)
id | value | read only |
---|---|---|
1 | 0 0 1 1 2 |
NO YES NO YES NO |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
c. T2 waits until T1 commits, then sets value to 2.
Preventing concurrency
Transaction 1 (writer)
Transaction 2 (writer)
id | value | read only |
---|---|---|
1 | 0 0 0 1 1 2 |
NO YES YES NO YES NO |
SET value = 1
SET value = 2
Isolation mode: READ COMMITTED
SELECT FOR UPDATE is a way of making
a read query behave like a write query.
SELECT FOR UPDATE
SELECT FOR UPDATE
Select for update in Django
MyModel.objects.select_for_update().get(id=1)
MyModel.objects.select_for_update().filter(id__in=[1, 3])
Select for update must be called within a transaction.
TransactionManagementError: select_for_update cannot be used
outside of a transaction.
@shared_task
def transfer(source_id, destination_id, amount):
"Make a transfer between two Accounts of the supplied amount."
source = Account.objects.get(id=source_id)
destination = Account.objects.get(id=destination_id)
with transaction.atomic():
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
How can we prevent concurrent transfers on the same source account?
@shared_task
def transfer(source_id, destination_id, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
# Wait for a lock on the source account
source = Account.objects.select_for_update().get(id=source_id)
destination = Account.objects.get(id=destination_id)
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
Solution: pessimistic locking
Users
Account balances
Shamtander v5
def transfer(source_id, destination_id, amount):
with transaction.atomic():
source = Account.objects.select_for_update().get(id=source_id)
...
There are two problems with this;
what are they?
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
if source.has_at_least(amount):
source_entry = LedgerEntry.objects.create(account=source,
amount=(amount * -1))
send_notification_email.delay(source_entry.id)
destination_entry = LedgerEntry.objects.create(account=destination,
amount=amount)
send_notification_email.delay(destination_entry.id)
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
if source.has_at_least(amount):
source_entry = LedgerEntry.objects.create(account=source,
amount=(amount * -1))
destination_entry = LedgerEntry.objects.create(account=destination,
amount=amount)
send_notification_email.delay(source_entry.id,
destination_entry.id)
Two problems
def transfer(source, destination, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
if source.has_at_least(amount):
source_entry = LedgerEntry.objects.create(account=source,
amount=(amount * -1))
destination_entry = LedgerEntry.objects.create(account=destination,
amount=amount)
send_notification_email.delay(source_entry.id,
destination_entry.id)
Two problems
with transaction.atomic():
foo()
transaction.on_commit(bar)
baz()
bar() will be called only once the transaction has been committed successfully.
with transaction.atomic():
foo()
transaction.on_commit(lambda: some_celery_task.delay('arg1'))
baz()
Use lambdas for celery tasks whenever you're in a transaction.
Caveat 2
@shared_task
def transfer(source_id, destination_id, amount):
"Make a transfer between two Accounts of the supplied amount."
with transaction.atomic():
# Wait for a lock on the source and destination accounts
Account.objects.select_for_update().filter(id__in=[source_id, destination_id])
source = Account.objects.get(id=source_id)
destination = Account.objects.get(id=destination_id)
if source.has_at_least(amount):
LedgerEntry.objects.create(account=source, amount=(amount * -1))
LedgerEntry.objects.create(account=destination, amount=amount)
Spot the bug
Caveat 2 - lazy select_for_update
Account.objects.select_for_update().filter(id__in=[source_id, destination_id])
Querysets are lazy!
In this case, the select_for_update will never be run.
bool(Account.objects.select_for_update().filter(id__in=[source_id, destination_id]))
Solution: wrap select_for_updates that use filter in a bool if you don't evaluate them straight away.
ERROR: deadlock detected
Detail:
Process 13560 waits for ShareLock on transaction 3147316424; blocked by process 13566.
Process 13566 waits for ShareLock on transaction 3147316408; blocked by process 13560.
What the...?
ids = [1, 2]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids)
)
Process 1
Process 2
id | read only |
---|---|
1 | YES |
2 | YES |
id | read only |
---|---|
1 | YES |
2 | YES |
Waiting for each other
ids = [2, 1]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids)
)
ids = [1, 2]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids).
.order_by('id')
)
Process 1
Process 2
ids = [2, 1]
bool(
AccountHolder.objects\
.select_for_update()\
.filter(id__in=ids).
.order_by('id')
)
Solution: when using select_for_updates on multiple records, make sure you acquire the locks in a consistent order.
This wraps your test in a transaction. This can be a problem if you are testing something that doesn't run in a transaction in real life.
Example: a select_for_update included in a celery task that is not wrapped in a transaction, will pass tests but fail when it runs in production.
django.test.TestCase
Does not wrap your test in a transaction. Slower, but better for code where you need to test behaviour relating to transactions.
django.test.TransactionTestCase
Atomicity
Concurrency
David Seddon http://seddonym.me
Savepoints allow you to roll back
within transactions
Query
Transaction
Query
> Savepoint
Query
Query
> Rollback
Transaction
Query
> Savepoint
Query
Query
> Savepoint
Query
> Savepoint
Query
Query
with transaction.atomic():
foo()
with transaction.atomic():
bar()
with transaction.atomic():
baz()
with transaction.atomic():
foobar()
with transaction.atomic():
foo() - Will be committed
try:
with transaction.atomic():
bar() - Will be rolled back
raise Exception
except:
pass
baz() - Will be committed
Exceptions raised within an atomic block will roll back that atomic block.