Schematic Tornado

Prostoráčci, 2017

Road of Shame

State of Python libraries and servers

  • metaserver + python27-fastrpc handler
  • hot-needle-stitched forking endless loop servers
  • mostly wheezy (but new components jessie)
  • maybe one or two libraries support Python 3k
  • test infrastructure only for metaserver

Earlier this year (actually last year December or something like that)

We should start using Python 3k...

...at any cost

py3k

But... metaserver

I should warn you!

No Python 3k support in the near future

Prepare python3 handler by yourself

Stop using proprietary legacy code in yours projects

New framework requirements

  • opensource
  • in active development
  • large community
  • modern
  • standard protocol and transport
  • used in Seznam

Tornado
REST JSON

Pros

  • completely opensource
  • tons of guides
  • standard protocols (even for frontend)
  • asynchronous
  • really fast if non-blocking

Cons

  • we can't use native async
  • no experience with async
  • no infrastructure for async
  • no request checks out of the box

So our long journey started...

If you can't tornado, you need a shelter

with Seznam's opensourced library tornado-shelter you can start your project with standard, predictable and extendable structure. Mostly known from django based projects.

So we reused it at project start

If you want to be async, but you're only Jessie...

Debian Jessie has only Python 3.4,
so you can't use native
async/await code,
you have to decorate all your asynchronous methods with tornado.gen.coroutine.

Yield problem

Without any experience with asynchronous programming, you will suffer from awful, annoying tracebacks or (worse) wrong working code

Non-blocking I/O operations

You need libraries to do RPC calls and database queries. Both problems already solved in internal Seznam Debian repositories. With few issues...

python3-tornado-fastrpc

python3-tornado-mysql

python3-tornado-fastrpc

Provides almost 1:1 interface as you know from sync Python programming. Don't forget your yields.

    @tornado.gen.coroutine
    def get(self):
        try:
            res = yield proxy.getData(123)
        except Exception as e:
            self.write('Error: {}'.format(e))
        else:
            self.write('Data: {}'.format(res.value))

python3-tornado-mysql

Repack of Tornado-MySQL. Current status: frozen with discontinued support (author lost motivation because of reasons).

But it was the only one async library for MySQL at our project start, so we doesn't check library status and used it in our server.

What could be worse?

Use unsupported library and make a mistake in your core code

with (yield cursor_for(self.context.db_master)) as cursor:
    entity = yield EntityModel.by_id(cursor, eid)

That code looks great and readable, but let's look deeper into it's implementation

@coroutine 
def cursor_for(pool):
  transaction = yield pool.begin()
  @contextmanager
  def cursor_manager(transaction):
    try:
      yield transaction._conn.cursor() 
    except:
      transaction.rollback()
      raise
    finally: 
      transaction.commit()
  raise Return(cursor_manager(transaction))

That code contains two mistakes

Smaller: transaction have to be commited in any case (because of finally).

Bigger: neither commit nor rollback will be called (they are both coroutines).

Side problem: python 3.4 doesn't have async __exit__ and __enter__.

How the f*ck this sh~ code even works?

self.options["autocommit"] = True

Huston, we have a PROBLEM

In Python earlier that 3.5 you can't use asynchronous context managers (even @tornado.gen.coroutines won't work).

We have had 4 MR that may fix that problem, but all one them were rejected after discussion

And what now?

There are few working solutions, what to do. All of them have some issues and side problems:

  1. stop using context managers (and rewrite major part of your handlers code)
  2. replace unsupported MySQL library with recommended one (backport it, rewrite your cursor and hope for sync commit/rollback methods)
  3. upgrade your project to Stretch and Python 3.5

Ehm, you said Schematic Tornado, don't you?

First at all, why not gRPC?!

Major problem with gRPC for Python is Java-like-Python server and client provided from the box

Also it's completely sync and brutally slow in Python

OK, why static scheme?

  • automatic type/attribute validation
  • pre-built client libraries
  • scheme versioning
  • self-documenting

Decision time!

cerberus, schema, colander,
marshmallow/webargs, voluptuous

What we have

  1. shared marshmallow models for client and server
  2. client and server request/response model validation from the box
  3. automatic response serialization
  4. automatic request to kwargs processing
  5. system.* like methods from metaserver
  6. method help and CLI
class DistributionSchema(Schema):
  web_id = Integer(
    dump_to="webId", 
    load_from="webId", 
    help="Web ID")
  installation_count = Float(
    dump_to="installationCount", 
    load_from="installationCount", 
    help="Average installation count")

class DistributionResponse(Response):
  distributions = List(
    Nested(DistributionSchema), 
    help="Known distributions")

How to use?

class WebDistributionHandler(RequestHandler): 
  @coroutine
  @schema(
    "web.distributions", uri="/web/distributions", 
    request=GetWebDistributionRequest, 
    response=GetWebDistributionResponse)
  def get(self, watcher_id, from_date, to_date, web_ids=None):
    ...

Do you also think everything is too smooth now?

Marshmallow disadvantages

  • marshmallow on client side sucks
  • external vs internal attribute name chaos
  • create vs update model problems
  • you can't merge different model types in response's lists

So, cake is a lie?

Actually not

  1. we started with old unsupported legacy tools, but...
  2. now we're using open source community supported technologies
  3. we have a lot of experience with debugging async code
  4. we've prepared code and infrastructure base for next projects (more or less test environment, metrics, docker containers)
  5. actually it's really big step for Sklik with it's #endif code a few years ago.

Next project bootstrap?

tornado (python 3.5)

swagger

amysql

If you want to change something, you've to break many things. Even working things... somehow working.

Schematic Tornado

By Alex Rembish

Schematic Tornado

  • 1,298