Celery - Problems and Solutions
@ Django Bulgaria Autumn Meetup
Martin Angelov
$ whoami
- Marto
- Working @ HackSoft
- Studying @ FMI
- Social Accounts:
- https://www.facebook.com/martin.angelov056
- https://twitter.com/_martin056
- https://github.com/martin056
What are we going to talk about today?
- A little bit of a revision first
- Some problems we've bumped into
- Some solutions to these problems
Revision
How many of you have used Celery before?
Take a look at these works by RadoRado:
Go to the docs:
For what do we use it in our Django projects?
- Third-party integrations
- Any other "slow" flows
But WHY?
Don't block the HTTP!
What is the interface? What do we use?
- tasks - @shared_task and @app.task
- signatures - .s() and .si()
- chains - chain()
- chords - chord()
What does happen when we decorate a function with `task decorator`?
Let the fun begin...
Problems
Problem 1
Error handling
- Logging
- Database
What do you prefer?
BaseErrorHandlerMixin
from .models import AsyncActionReport
class BaseErrorHandlerMixin:
def on_failure(self, exc, task_id, args, kwargs, einfo):
AsyncActionReport.objects.filter(id=kwargs['async_action_report_id'])\
.update(status=AsyncActionReport.FAILED,
error_message=str(exc),
error_traceback=einfo)
def on_success(self, retval, task_id, args, kwargs):
AsyncActionReport.objects.filter(id=kwargs['async_action_report_id'])\
.update(status=AsyncActionReport.OK)
Why do we use a mixin?
Here is how the model looks like ->
AsyncActionReport
class AsyncActionReport(models.Model):
PENDING = 'pending'
OK = 'ok'
FAILED = 'failed'
STATUS_CHOICES = (
(PENDING, 'pending'),
(OK, 'ok'),
(FAILED, 'failed')
)
status = models.CharField(max_length=7, choices=STATUS_CHOICES, default=PENDING)
action = models.CharField(max_length=255)
error_message = models.TextField(null=True, blank=True)
error_traceback = models.TextField(null=True, blank=True)
def __str__(self):
return self.status
How do we use it? ->
Calling a task
def call_some_task(arg1, arg2):
action = 'Proper message here.'
async_action_report = AsyncActionReport.objects.create(action=action)
return some_task.delay(arg1, arg2, async_action_report_id=async_action_report.id)
Defining a task
class SomeBaseTask(BaseErrorHandlerMixin, Task):
pass
@shared_task(task=SomeBaseTask)
def some_task(arg1, arg2, **kwargs):
# do some slow and complicated logic
return
These were only the basics
What to think about:
- Update the AsyncActionReport model
- You may want to know if the admin triggered it
- You may want to know if it's created because of some inner logic (e.g. splitting logic into tasks). If so, you can add `system_call` boolean field to the model or something like that.
- Move the creation of AsyncActionReports into a service
-
Don't use the same AsyncActionReports for several different tasks. This may lead you into some nasty bugs!
- More about that in short.
Before we continue
I've written a blog post about that in our blog. You can check it out for more details
https://www.hacksoft.io/blog/handle-third-party-errors-with-celery-and-django/
Problem 2
Code Separation
- Business logic in the tasks
- Tasks that are created to be used in combination with other tasks (chains, chords)
Solutions
- Write simple Python functions (services) where you write the business logic. Tasks should be "pure". The perfect case is when tasks just call other functions.
- More about that in short
- Split your tasks into different files
- Think of the naming conventions. If the task will be used only in combination with others, make it "private"
- "protected" - `_var`
- "private" - `__var` <- This is the one to use for the tasks
- I'm not sure if this one have a name but you can use `var_` if the name of the variable is built-in (e.g. `type_`, `id_`)
Basic Example
@shared_task(base=InvoicesPlusBaseTask)
def __fetch_data(**kwargs):
client = InvoicesPlusClient(api_key=settings.THIRD_PARTY_API_KEY)
fetched_data = client.fetch_data_method()
return fetched_data
@shared_task
@transaction.atomic
def __store_data(fetched_data):
container = ThirdPartyDataStorage(**fetched_data)
container.save()
return container.id
@shared_task(base=InvoicesPlusBaseTask)
def fetch_data_and_store_it(**kwargs):
async_action_report = AsyncActionReport.objects.create(action='Fetching data.')
t1 = __fetch_data.s(async_action_report_id=async_action_report.id)
t2 = __store_data.s()
return chain(t1, t2).delay()
Problem 3
Race Conditions
- We can't run away from them - it's a bad idea to block the workers
- The more we use Celery, the more tricks we learn to stay away from such conditions
Let's take a look at the following example ->
Is there any problem here?
@shared_task
def inner_chain():
t1 = some_task.s()
t2 = other_task.s()
return chain(t1, t2).delay()
@shared_task
def mother():
t1 = inner_chain.s()
t2 = regular_task.s()
return chain(t1, t2).delay()
Solutions
- Not really...
- Write everything in 1 task
- Think of another way to combine your tasks instead of the regular `chain`
@chainable
def chainable(chain_wannabe):
@wraps(chain_wannabe)
def wrapper(*args, **kwargs):
signatures = chain_wannabe(*args, **kwargs)
error_msg = 'Functions decorated with `chainable` must return Signature instances.'
if not isinstance(signatures, Iterable):
raise ValueError(error_msg)
for task_sig in chain_wannabe(*args, **kwargs):
if not isinstance(task_sig, Signature):
raise ValueError(error_msg)
return signatures
return wrapper
Usage
from path.to.decorators import chainable
@chainable
def inner_chain():
t1 = some_task.s()
t2 = other_task.s()
return t1, t2
@shared_task
def mother():
inner_tasks = inner_chain()
t2 = regular_task.s()
return chain(*inner_tasks, t2).delay()
Problem 4
Discipline
- Share knowledge about decorators like `@chainable` through the team - higher maintainability
- Write comments in your tasks
- Care about the shared state between the chained tasks. It's dangerous, I know. Control it!
Problem 5
Testing
- Master mocks
- Validation tests
Conclusion
- Use Celery carefully
- Read the docs!
- Discipline the team
- Try to follow the good practices - they can only help you
- Don't block your workers!
- Be careful with the race conditions - think of all possible scenarios
- Introduce your own structures in order to minimize the mistakes
Demo time
Thank you!
Questions?
Celery - Problems and Solutions
By Martin Angelov
Celery - Problems and Solutions
Django Bulgaria Autumn Meetup
- 748