Boris Burkov,

web-developer,

bioinformaticist

e-mail: BurkovBA@gmail.com

CV

  • 2004-2009 student, Bioinformatics and Bioengineering Faculty, Lomonosov Moscow State University.
  • 2009-2012 Ph.D. student, Bioniformatics and Bioengineering, Lomonosov MSU
  • 2013-2014 Research associate, Laboratory of Mathematical Methods in Biology, Belozersky Institute of Phys.-Chem. Biology / Engelhardt Institute of Molecular Biology
  • 2014-2015 Backend -> full-stack web programmer, Tegra
  • 2016-current Full-stack web-programmer, BostonGene

Selected projects

  • BostonGene Universe

  • Django REST Framework Mongoengine

  • Knackit.com

  • Allpy multiple alignment editor

  • ... and lots of minor projects

BostonGene

Universe

  • Bioinformatical pipelines engine and UI (Galaxy clone)
  • late 2015-current, currently in private beta
  • 1.5 developers initially, now a team of 4
  • My role: Frontend, Backend, DevOps

...except by one maybe

Architecture

Common Workflow Language

To describe workflows and tools in Universe project we're using Common Workflow Language (CWL) (slightly reminiscent of JSON Schema).

 

Each workflow is represented by a JSON.

Backend

  • Django and MongoDB are worst friends.
  • Still, bioinformatical data are often well-described with deeply nested JSONs.
  • Bioinformaticists and data scientists love python and it is often convenient for them to serialize intermediate data as Mongo JSONs. You can't use Django ORM as a standalone tool in your scripts for serialization, unlike Pymongo.
  • Our bottleneck is not REST api, so we're not that concerned with request optimization (though, there are ways to mitigate e.g. N+1 problem in Mongo).
  • Document-level transaction isolation is sufficient for many operations in our project.

Why Mongo?

  • Django ORM is NOT loosely coupled with rest of Django - that's a bad design to me.
  • Complete RDBMS-ectomy in a Django project can get tricky, so currently we're keeping Postgres in the project just to keep Django happy and to use some third-party packages that assume RDBMS. We might as well use it for operations that require strict transaction isolation (e.g. billing).
  • We love Django REST Framework, but how to integrate it with MongoDB?

Mongo + Django: howto?

  • Mongoengine is a MongoDB ODM with an interface that aims to replicate Django ORM API.

Mongoengine

from mongoengine import *                           # To define a schema for a
                                                    # document, we create a
class Metadata(EmbeddedDocument):                   # class that inherits from
    tags = ListField(StringField())                 # Document.
    revisions = ListField(IntField())
                                                    # Fields are specified by
class WikiPage(Document):                           # adding field objects as
    title = StringField(required=True)              # class attributes to the
    text = StringField()                            # document class.
    metadata = EmbeddedDocumentField(Metadata)
                                                    # Querying is achieved by
>>> page.title = "Hello, World!"                    # calling the objects
>>> for page in WikiPage.objects:                   # attribute on a document
>>>     print page.title                            # class.

Due to similarity of interfaces, Mongoengine can be integrated with Django REST Framework in just ~2-3 thousand lines of glue code.

 

That code is known as Django REST Framework-Mongoengine.

DRF-Mongoengine

  • As ModelSerializers in DRF automatically generate serializers for Django Models, DocumentSerializers in DRF-Mongoengine automatically generate them for your Mongoengine Documents. 
  • Interestingly, Django REST Framework with RDBMS creates N+1 problem with its nested serializers, so out of the box Mongoengine IMPROVES performance! (things are not that simple, if you dig deeper though)

DRF-Mongoengine

Frontend

+ 50 other

plugins, libraries and tools

Build system

JSON_Schema

CWL -> JSON Schema -> Angular schema form

JointJS

CWL -> Dagre -> JointJS

Angular 1 -> Angular 1.5 -> Angular 2

DevOps

Jenkins CI pipeline

Code quality

Docker private registry

Django REST Framework Mongoengine

  • Essentially Django REST Framework, where Django ORM is replaced by Mongoengine ODM
  • Open-source project at Github, started in 2014 by Umut Bozkurt, currently maintained by Maxim Vasiliev
  • Current state: stable with somewhat limited functionality, ~170 stars at github 
  • My role: contributor (joined in 2016)

DRF architecture

TravisCI

Codecov

Knackit.com

Tegra

  • Basically, a Foursquare clone
  • 2014 - mid-2015, 3 versions, went to production, was closed due to business reasons
  • 3 developers

Technologies: All-Star-2012

  • Backbone.js, jQuery, Underscore, Grunt, Bower, AMD/Require.js, SASS/Compass, qUnit, JSHint.
  • Django, MongoDB, Nginx (+Elasticsearch, Logstash, OpenStreetMap+Leaflet)
  • Google ComputeEngine -> Hetzner + Docker

Allpy

Multiple alignment editor

  • Centered around detection of reliable blocks within multiple alignments of protein sequences
  • 2013-2014
  • A library with domain logic in python
  • Application logic, implementing several algorithms
  • GUI initially in PyGTK, later migrated to PyQt

Trac

Domain-Driven Design

GTK+/Qt

controller functions

classes

serialization methods

Qt vs Gtk+

Domain driven design paid off: it allowed me to switch from Gtk+ to Qt pretty easily. Only presentation logic layer had to be re-done.

 

Qt has a better architecture, better documentation

and better performance. For instance, TableView in Gtk has a performance limit of 100 columns with bottleneck function starting with a comment like "this is a scary function, I don't undestand how it's working, better don't touch it". :(

deck

By vasjaforutube1

deck

  • 53