Designing asynchronous tasks
Examples with Celery
PyTech Warsaw - 06.12.2018
Przemek Lewandowski
@haxoza


About me



Python community in Warsaw


PyWaw (pywaw.org)
- Est. 2011
- 80 meetups
- 1 conference
- 111 speakers
- 175 talks
PyLightWaw (pylight.org)
- Est. 2017
- 10 meetups
- 19 talks
- 1 discussion panel

We're looking for speakers!



Assumptions
You know a little bit about:
- Celery
- Brokers or Queues (AMPQ, Redis)
- Async / Distributed processing

Task
A single unit of work to be executed asynchronously.


Basic example
@app.task
def send_welcome_email(user_id, context):
user = User.objects.get(id=user_id)
email_address = user.get_email()
WelcomeEmail(context=context).send(to=email_address)
- Simple business logic
- Not harmful on failure
- Nothing saved to database
- Side effect: email sent

Real life example
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)

Configure Sentry
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
import sentry_sdk
from sentry_sdk.integrations.celery import CeleryIntegration
sentry_sdk.init(integrations=[CeleryIntegration()])

Add app level state management
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.status = Document.Status.failed
document.save()
raise

Make sure document exists
@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.set_status_failed()
raise
class DocumentManager(models.Manager):
@transaction.atomic
def create_and_process(self, document):
from .tasks import process_document
document.save()
transaction.on_commit(
lambda: process_document.delay(document.pk),
)
return document

Retry on particular exceptions
@app.task(
autoretry_for=(RequestException,),
retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.set_status_failed()
raise

Route tasks by time of execution
@app.task(
autoretry_for=(RequestException,),
retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
except Exception as e:
document.set_status_failed()
raise
CELERY_ROUTES = {
'tasks.process_document': {'queue': 'processing_queue'},
'tasks.send_welcome_email': {'queue': 'quick_queue'},
}

Add more verbosity
logger = get_task_logger(__name__)
@app.task(
autoretry_for=(RequestException,),
retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
document = Document.objects.get(pk=document_id)
try:
process_document_service = ProcessDocumentService.create(
language=document.language,
logger=logger,
)
process_document_service.process(document)
logger.info("Processing succeed: {0}".format(repr(document)))
except Exception as e:
document.set_status_failed()
logger.exception("Processing failed: {0}".format(repr(document)))
raise

Task granularity
Processing options:
- data chunks one by one
- all data at once

Processing one by one

@app.task
def process_document(document_id):
document = Document.objects.get(pk=document_id)
process_document_service = ProcessDocumentService.create(
document.language,
)
process_document_service.process(document)
Processing all at once

@app.task
def process_document():
documents = Document.objects.filter(
status=Document.Status.new
)
process_document_service = ProcessDocumentService.create()
for document in documents:
process_document_service.process(document)
What else to
take into account?

Finding out more
- Task granularity
- Task serializing format
- Task idempotency
- "Application vs task" level state management
- Task testing
- Execution monitoring, e.g. Celery Flower

Resources
- Celery - docs.celeryproject.org
- django-celery-results package
- Python RQ - python-rq.org
- Huey - huey.readthedocs.io
- Scaling Python book - scaling-python.com

Thanks!
Questions?

Designing asynchronous processing
By Przemek Lewandowski
Designing asynchronous processing
- 2,232