Designing asynchronous tasks
Examples with Celery

PyTech Warsaw - 06.12.2018

Przemek Lewandowski

@haxoza

About me

Python community in Warsaw

PyWaw (pywaw.org)     

  • Est. 2011              
  • 80 meetups
  • 1 conference
  • 111 speakers
  • 175 talks

PyLightWaw (pylight.org)

  • Est. 2017
  • 10 meetups
  • 19 talks
  • 1 discussion panel

We're looking for speakers!

Assumptions

You know a little bit about:

  • Celery
  • Brokers or Queues (AMPQ, Redis)
  • Async / Distributed processing

Task

A single unit of work to be executed asynchronously.

Basic example

@app.task
def send_welcome_email(user_id, context):
    user = User.objects.get(id=user_id)
    email_address = user.get_email()
    WelcomeEmail(context=context).send(to=email_address)
  • Simple business logic
  • Not harmful on failure
  • Nothing saved to database
  • Side effect: email sent 

Real life example

@app.task
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    process_document_service = ProcessDocumentService.create(
        document.language,
    )
    process_document_service.process(document)

Configure Sentry

@app.task
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    process_document_service = ProcessDocumentService.create(
        document.language,
    )
    process_document_service.process(document)
import sentry_sdk
from sentry_sdk.integrations.celery import CeleryIntegration

sentry_sdk.init(integrations=[CeleryIntegration()])

Add app level state management

@app.task
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    try:
        process_document_service = ProcessDocumentService.create(
            document.language,
        )
        process_document_service.process(document)
    except Exception as e:
        document.status = Document.Status.failed
        document.save()
        raise

Make sure document exists

@app.task
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    try:
        process_document_service = ProcessDocumentService.create(
            document.language,
        )
        process_document_service.process(document)
    except Exception as e:
        document.set_status_failed()
        raise
class DocumentManager(models.Manager):

    @transaction.atomic
    def create_and_process(self, document):
        from .tasks import process_document
        document.save()
        transaction.on_commit(
            lambda: process_document.delay(document.pk),
        )
        return document

Retry on particular exceptions

@app.task(
    autoretry_for=(RequestException,),
    retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    try:
        process_document_service = ProcessDocumentService.create(
            document.language,
        )
        process_document_service.process(document)
    except Exception as e:
        document.set_status_failed()
        raise

Route tasks by time of execution

@app.task(
    autoretry_for=(RequestException,),
    retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    try:
        process_document_service = ProcessDocumentService.create(
            document.language,
        )
        process_document_service.process(document)
    except Exception as e:
        document.set_status_failed()
        raise
CELERY_ROUTES = {
    'tasks.process_document': {'queue': 'processing_queue'},
    'tasks.send_welcome_email': {'queue': 'quick_queue'},
}

Add more verbosity

logger = get_task_logger(__name__)

@app.task(
    autoretry_for=(RequestException,),
    retry_kwargs={'max_retries': 5},
)
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    try:
        process_document_service = ProcessDocumentService.create(
            language=document.language,
            logger=logger,
        )
        process_document_service.process(document)
        logger.info("Processing succeed: {0}".format(repr(document)))
    except Exception as e:
        document.set_status_failed()
        logger.exception("Processing failed: {0}".format(repr(document)))
        raise

Task granularity

Processing options:

  • data chunks one by one
  • all data at once

Processing one by one

@app.task
def process_document(document_id):
    document = Document.objects.get(pk=document_id)

    process_document_service = ProcessDocumentService.create(
        document.language,
    )
    process_document_service.process(document)

Processing all at once

@app.task
def process_document():
    documents = Document.objects.filter(
        status=Document.Status.new
    )

    process_document_service = ProcessDocumentService.create()
    for document in documents:
        process_document_service.process(document)

What else to
take into account?

Finding out more

  • Task granularity
  • Task serializing format
  • Task idempotency
  • "Application vs task" level state management
  • Task testing
  • Execution monitoring, e.g. Celery Flower

Resources

  • Celery - docs.celeryproject.org
  • django-celery-results package
  • Python RQ - python-rq.org
  • Huey - huey.readthedocs.io
  • Scaling Python book - scaling-python.com

Thanks!

Questions?

Designing asynchronous processing

By Przemek Lewandowski

Designing asynchronous processing

  • 2,050