Asynchronous Task Queues

INFO 253B: Backend Web Architecture

Kay Ashaolu

First a step back

  • At this point you have learned most of what you need to learn to build a real backend system that can take requests, process data, and return results
  • However, once we want to deploy our services to the real world, we find that there are a number of real challenges that occur outside of your safe development space

What if?

  • We have experience now building a server that listens via HTTP to requests
  • We have the tooling that typically responds to those requests really quickly
  • However, what would happen if we wanted to add actions that either
    • takes a long time to process or
    • can fail routinely?

Example: our dictionary service

  • Very fast since it uses a local dictionary (or a direct connection to a MySQL database) to store and retrieve words and definitions
  • However what if we wanted to store a word count of all of the times a given word has occurred in a list of 100 websites? 
  • We could imagine that that could take a long time to
    • retrieve all pages for a website and
    • count all instances of the word (this could take several minutes to hours!)

Image waiting 1 hr for a page to load

  • This is not terribly farfetched - we are in the world of big data
  • Data pipeline runs can easily take several hours to days to complete
  • The HTTP connection from the client to your server will timeout before it is completed
  • What do you do?

Asynchronous Task Queues

  • Provides the ability to execute work asynchronously
  • Application submits a task to be on a queue for a another process to pick it up and do the work
  • Application can do other work while task is being completed
  • Application can be notified when task is complete

Quick note: Asynchronous vs Synchronous

  • Most of the code we have been writing so far has been synchronous: our applications run from beginning to end - the next line of code waits for the previous line of code to complete before moving on
  • Asynchronous code does not wait for the previous line to complete: the task is completed in parallel to the main application
  • Typically there is a "callback" or way for the main application to be notified when the task is done (if necessary)

For example: word count web crawler

  • When we get a definition from our server, the server puts a message in the asynchronous task queue to go find the count for that word
  • A worker picks up the task from the task queue and continues to work
  • Our server tells the user that their job was scheduled
  • When the task is done, the task writes the result in the database
  • Our application can read that data and present it to the user

Asynchronous Task Queue

Other benefits of asynchronous task queues

  • Isolation of unexpected results from external API calls. When you use an external API, you don't know if it's down or slow. Having your server directly request data fro API calls can make it subject to the state of the external system

Other benefits of asynchronous task queues

  • Better handling of errors from external APIs. Because work is being done in a separate isolated process, we can 1) retry the request later, 2)  silently discard the task that we were trying to do, or a combination of both without affecting our API speed and reliability

Other benefits of asynchronous task queues

  • Better able to handle large amounts of requests. If we had a surge of requests our HTTP server could go down. However if we are simply scheduling messages to our asynchronous task queue, then our task queue could handle the surge and process outside of our application server. This provides a better user experience and more resistance to errors

Other benefits of asynchronous task queues

  • True decoupling of systems. If we had two systems that communicated through a asynchronous task queue, if one goes down, messages would simply accumulate in the queue
  • Once the system is back up, it can resume processing those tasks

What technologies are used?

  • Typically there are two pieces of technologies that are necessary to implement an Asynchronous task queue:
    • A queue. A service that at the very least can save messages in order in different buckets, (i.e. queues)
    • A task runner. A library that enables processes to pick up messages from a queue and execute a predefined piece of code using the message as input

Example Job

@job('default', connection=connection)
def send_email(send_to, subject, text):

    url = 'https://api.mailgun.net/v3/{}/messages'.format(MAILGUN_DOMAIN_NAME)
    auth = ('api', MAILGUN_API_KEY)
    data = {
        'from': 'Me <mailgun@{}>'.format(MAILGUN_DOMAIN_NAME),
        'to': send_to,
        'subject': subject,
        'text': text,
    }

    response = requests.post(url, auth=auth, data=data)
    response.raise_for_status()

Example Job request

job = q.enqueue("taskrunner.send_email",           
                "kay@ischool.berkeley.edu", 
                "New Definition Saved", 
                "word: {} with definition: {} was saved".format(word, definition))

Questions?