Having it all: distributed services with Django, Boto, and SQS queues

 

Julio Vicente Trigo Guijarro

@juliotrigo

PyConEs 2015

Valencia, 22nd November

 

Computer Science Engineer

About me

University of Alicante

➜ London

➜ Software Engineering

Scrum / TDD

➜ Python

Index

What we had / What we wanted / What we did NOT want

Different approaches

Our solution

Implementation

Thank you / QA

Our infrastructure

Our infrastructure

Python 2.7

Ubuntu

RDS

EC2

ELB

S3

SQS

AWS:

Our infrastructure (II)

indexer

www

www2

www3

www4

www5

www-elb

solr-elb

solr

solr2

products

api

images

api

Marketing

What we had

Servers

Teams

Processes to run

Common resources

indexer

www

solr

Development

Product Information

Update product availability

Generate marketing feeds

Solr indexer

Import products into the DB

Update product pricing

Products DB (MySQL)

Availability DB (Redis)

Product images (S3 bucket)

Marketing feeds (S3 bucket)

What we had (II)

Growing fast

➜ Development team:

Implementing services (internal use: operational)

Running them manually (when requested)

Expected failures (data issue...): re-run

➜ Long processes

What we wanted

 Allow other teams to run the processes

Run them remotely

 Authentication

 Authorization

Secuential access to common resources

➜ Semaphores

What we wanted (II)

 Simplify process invocation

➜ Easy user interface (command line)

No technical skills required

Run processes asynchronously

 Quick solution (internal tool)

 Robust

What we did NOT want

 Give other teams SSH access to our servers

 Maintain server SSH keys / passwords in our services

Concurrent access to common resources

Our own (service invocation) implementation

➜ Development time, blocked resources, bugs...

 A single entity (server, service...) with access to all the servers

Different approaches

Give other teams access to our servers

 All the teams would need SSH access

Cannot predict how they would use it

 Multiple (concurrent?) executions

 Unpredictable results

 Coordination between teams

Different approaches (II)

 Easy user interface

Teams don't need server access

Single point of failure

Access to all the servers

Guarantee sequentiality ("common resources")

Central application that calls the remote services

Different approaches (III)

Queues

 Central service that calls the services remotely

 Central service that queues messages

 Sequentiality handled by the queues

✔ Asynchronism

✔ No need to know services location

 Robust

Our solution

Queues!

SQS

Implementation

Boto

Control Panel (Django)

Distributed Services

Upstart

Queues (SQS)

Implementation

Queues (SQS)

Boto

Control Panel (Django)

Distributed Services

Upstart

Queues

AWS Management Console

Queues (II)

Push messages to the queues from any server

Pull messages from the queues from any server

➜ Each critical section has its own queue


common_resources = ['products', 'availability', 'S3_feeds', 'S3_images']

Represent common resources, not services

Implementation

Queues (SQS)

Boto

Distributed Services

Upstart

Control Panel (Django)

Boto

  +46 Million downloads (top 10)

➜ Boto 3:

- "Stable and recommended for general use"

- Work is under way to support Python 3.3+

Amazon Web Services SDK for Python

Boto (II)

#  sqs_handler.py

import boto.sqs
from boto.sqs.message import Message


class SqsHandler(object):

    service_name = None

    def __init__(
        self, aws_region, aws_access_key_id, aws_secret_access_key,
        aws_queue_name, logger, sleep_sec
    ):
        self.conn = boto.sqs.connect_to_region(
            aws_region,
            aws_access_key_id=aws_access_key_id,
            aws_secret_access_key=aws_secret_access_key
        )
        self.queue = self.conn.get_queue(aws_queue_name)
        self.logger = logger
        self.sleep_sec = sleep_sec

    # ...

Boto (III)

    # ...

    def mq_loop(self):
        while True:
            try:
                message = self.queue.read()

                if message is not None:
                    body = message.get_body()
                    
                    if body["service"] == self.service_name: 
                        self.queue.delete_message(message)
                        self.process_message(body)

            except Exception:
                self.logger.error(traceback.format_exc())
            finally:
                time.sleep(self.sleep_sec)

    # ...

Boto (IV)

    # ...

    def process_message(self, body):
        raise NotImplementedError()

    def write_message(self, body):
        m = Message()
        m.set_body(body)
        self.queue.write(m)

Implementation

Queues (SQS)

Boto

Control Panel (Django)

Distributed Services

Upstart

Control Panel

 Django website: invoke services

➜ Django admin: authentication (HTTS) & authorization

Control Panel (II)

# ...settings.base.py

# ...

GROUP_CONTROL_PANEL = 'control_panel'
GROUP_DASHBOARD = 'dashboard'
GROUP_SERVICES = 'services'
GROUP_AVAILABILITY = 'availability'
GROUP_AVAILABILITY_UPDATE = 'availability_update'
# ...

AWS_REGION = os.environ['AWS_REGION']
AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']

# ...

 Authorization according to permissions (Django Goups)

Control Panel (III)

 Services invoked from different pages

Message displayed after invocation

Control Panel (IV)

# ...settings.production.py

from .base import *


# ...

AWS_AVAILABILITY_QUEUE_NAME = 'availability'

# ...

Control Panel (V)

# ...permissions.py


def group_check(user, groups):
    if not user.is_authenticated():
        return False

    if user.is_superuser:
        return True

    for group in groups:
        if not user.groups.filter(name=group).exists():
            raise PermissionDenied

    return True


def control_panel_group_check(user):
    groups = [settings.GROUP_CONTROL_PANEL]

    return group_check(user, groups)

# ...

Control Panel (VI)

# ...

def avl_update_group_check(user):
    groups = [
        settings.GROUP_CONTROL_PANEL,
        settings.GROUP_DASHBOARD,
        settings.GROUP_SERVICES,
        settings.GROUP_AVAILABILITY,
        settings.GROUP_AVAILABILITY_UPDATE,
    ]

    return group_check(user, groups)

# ...

Control Panel (VII)

# ...views.availability.py

@require_POST
@login_required
@user_passes_test(avl_update_group_check)
def avl_update(request):

    # ....

    test_only = request.POST.get('test_only', False)
    send_email = request.POST.get('send_email', False)
    queue_name = settings.AWS_AVAILABILITY_QUEUE_NAME

    body = {
        "service": "availability",
        "parameters": {
            "test_only": test_only,
            "send_email": send_email,
        }
    }

    mq = SqsHandler(
        settings.AWS_REGION,
        settings.AWS_ACCESS_KEY_ID,
        settings.AWS_SECRET_ACCESS_KEY,
        queue_name,
        logger,
        settings.MQ_LOOP_SLEEP_SEC
    )

    mq.write_message(json.dumps(body))

    # ....

Control Panel (VIII)

For each service:

- Queue name (may be shared across services)

- Payload

 Send messages to the correct queue

Implementation

Queues (SQS)

Boto

Control Panel (Django)

Distributed Services

Upstart

Distributed Services

 Read messages from the correct queue (sequentially)

➜ Check messages (service name)

Delete messages

Deployed on any server 

 Process messages (service logic)

Distributed Services (II)

#  mq_loop.py

# ...

class SqsAvlHandler(SqsHandler):

    service_name = "availability"

    def process_message(self, body):
        message = json.loads(body)

        service = message["service"]

        if service == service_name:
            test_only = message["parameters"]["test_only"]
            send_email = message["parameters"]["send_email"]

            avl(test_only, send_email)
        else:
            raise Exception(
                "Service {0} not supported.".format(service))

    # ...

Distributed Services (III)

# mq_loop.py

# ...

if __name__ == "__main__":

    # ...

    mq = SqsAvlHandler(
        settings.AWS_REGION,
        settings.AWS_ACCESS_KEY_ID,
        settings.AWS_SECRET_ACCESS_KEY,
        settings.AWS_AVAILABILITY_QUEUE_NAME,
        logger,
        settings.AVAILABILITY_LOOP_SLEEP_SEC
    )

    mq.mq_loop()

Distributed Services (IV)

# mq_loop.sh

#!/bin/bash

source /var/.virtualenvs/avl/bin/activate
cd /var/avl
python mq_loop.py

Implementation

Queues (SQS)

Boto

Control Panel (Django)

Distributed Services

Upstart

Upstart

 Keep services up and running

 Debian and Ubuntu moved to systemd

 "Upstart is an event-based replacement for the /sbin/init daemon which handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running."

Upstart (II)

# /etc/init/avl.conf

description "Availability update"

# Start up when the system hits any normal runlevel, and
# shuts down when the system goes to shutdown or reboot.
# 
# 0 : System halt.
# 1 : Single-User mode.
# 2 : Graphical multi-user plus networking (DEFAULT)
# 3 : Same as "2", but not used.
# 4 : Same as "2", but not used.
# 5 : Same as "2", but not used.
# 6 : System reboot.
start on runlevel [2345]
stop on runlevel [06]

# respawn the job up to 10 times within a 5 second period.
# If the job exceeds these values, it will be stopped and
# marked as failed.
respawn
respawn limit 10 5

setuid avl
setgid avl

exec /var/avl/mq_loop.sh

Upstart (III)

/etc/init/avl.conf

/etc/init.d/avl > /lib/init/upstart-job

But this is just a quick

and simple solution...

 Internal use

 A few messages per day

➜ Expected exceptions processing messages

Going further...

SQJobs

Microservices

"The term "Microservice Architecture" has sprung up over the last few years to describe a particular way of designing software applications as suites of independently deployable services. While there is no precise definition of this architectural style, there are certain common characteristics around organization around business capability, automated deployment, intelligence in the endpoints, and decentralized control of languages and data." - Martin Fowler

http://martinfowler.com/articles/microservices.html

Nameko

 Framework for building microservices in Python

Built-in support for:

 RPC over AMQP

 Asynchronous events (pub-sub) over AMQP

Simple HTTP GET and POST

Websocket RPC and subscriptions (experimental)

Encourages the dependency injection pattern

Nameko (II)

# helloworld.py

from nameko.rpc import rpc


class GreetingService(object):

    name = "greeting_service"

    @rpc
    def hello(self, name):
        return "Hello, {}!".format(name)
$ nameko run helloworld
starting services: greeting_service
...
$ nameko shell
>>> n.rpc.greeting_service.hello(name="Julio")
u'Hello, Julio!'

Nameko (III)

# http.py

import json
from nameko.web.handlers import http


class HttpService(object):

    name = "http_service"

    @http('GET', '/get/<int:value>')
    def get_method(self, request, value):
        return json.dumps({'value': value})

    @http('POST', '/post')
    def do_post(self, request):
        return "received: {}".format(request.get_data(as_text=True))
$ nameko run http
starting services: http_service
...

Nameko (IV)

$ curl -i localhost:8000/get/42
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Content-Length: 13
Date: Fri, 13 Feb 2015 14:51:18 GMT

{'value': 42}
$ curl -i -d "post body" localhost:8000/post
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Content-Length: 19
Date: Fri, 13 Feb 2015 14:55:01 GMT

received: post body

Thank you!

@juliotrigo

https://slides.com/juliotrigo/pycones2015-distributed-services

PyConES 2015 - Distributed Services

By juliotrigo

PyConES 2015 - Distributed Services

  • 1,384