Designing asynchronous tasks
Examples with Celery
PyTech Warsaw - 06.12.2018
Przemek Lewandowski
@haxoza
About me
Python community in Warsaw
PyWaw (pywaw.org)
- Est. 2011
- 80 meetups
- 1 conference
- 111 speakers
- 175 talks
PyLightWaw (pylight.org)
- Est. 2017
- 10 meetups
- 19 talks
- 1 discussion panel
We're looking for speakers!
Assumptions
You know a little bit about:
- Celery
- Brokers or Queues (AMPQ, Redis)
- Async / Distributed processing
Task
A single unit of work to be executed asynchronously.
Basic example
@app.task
def send_welcome_email(user_id, context):
user = User.objects.get(id=user_id)
email_address = user.get_email()
WelcomeEmail(context=context).send(to=email_address)
- Simple business logic
- Not harmful on failure
- Nothing saved to database
- Side effect: email sent
Real life example
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
Configure Sentry
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
import sentry_sdk
from sentry_sdk.integrations.celery import CeleryIntegration
sentry_sdk.init(integrations=[CeleryIntegration()])
Add app level state management
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.status = Document.Status.failed
document.save()
raise
Make sure document exists
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.set_status_failed()
raise
class DocumentManager(models.Manager):
@transaction.atomic
def create_and_process(self, document):
from .tasks import process_document
document.save()
transaction.on_commit(
lambda: process_document.delay(document.pk),
)
return document
Retry on particular exceptions
@app.task(
autoretry_for=(RequestException,),
retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.set_status_failed()
raise
Route tasks by time of execution
@app.task(
autoretry_for=(RequestException,),
retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.set_status_failed()
raise
CELERY_ROUTES = {
'tasks.process_document': {'queue': 'processing_queue'},
'tasks.send_welcome_email': {'queue': 'quick_queue'},
}
Add more verbosity
logger = get_task_logger(__name__)
@app.task(
autoretry_for=(RequestException,),
retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
language=document.language,
logger=logger,
)
process_document_service.process(document)
logger.info("Processing succeed: {0}".format(repr(document)))
except Exception as e:
document.set_status_failed()
logger.exception("Processing failed: {0}".format(repr(document)))
raise
Task granularity
Processing options:
- data chunks one by one
- all data at once
Processing one by one
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
Processing all at once
@app.task
def process_document():
documents = Document.objects.filter(
status=Document.Status.new
)
process_document_service = ProcessDocumentService.create()
for document in documents:
process_document_service.process(document)
What else to
take into account?
Finding out more
- Task granularity
- Task serializing format
- Task idempotency
- "Application vs task" level state management
- Task testing
- Execution monitoring, e.g. Celery Flower
Resources
- Celery - docs.celeryproject.org
- django-celery-results package
- Python RQ - python-rq.org
- Huey - huey.readthedocs.io
- Scaling Python book - scaling-python.com
Thanks!
Questions?
Designing asynchronous processing
By Przemek Lewandowski
Designing asynchronous processing
- 2,050