Started programming games in 5th grade
Fight legacy php/mysql applications by day
Python freelancer by afternoon
Adserver requirements
Why python
* in this case
After a lot of searching...
Watching at open-source dbs, found 2 cases of index intersection
Postgresql
Lucene / Elastic-search
Found a pattern, searching for a module to keep a set of integers as a bitset + provide fast functions to intersect, substract etc... intbitset!
When the ‘id’ of ads is incremental, after a couple of months, the intbitsets will get less efficient because they will become more sparse, since valid ids will start from ex: 50 000. And some ads will complete fast (hence be removed from the intbitset) and some ads will stay active for a long time.
Keep an internal ad-id -> smallest-positive-integer on the lmdb with free lists so you don’t have sparse intbitsets (lower memory + faster computation).
Scan the master db all in an interval, and update the local ads db + all the indexes
Use lmdb for storing all the data
Transactions keeps everything consistent
Use capnproto to serialize/deserialize ad objects
Keep most of the indexes in intbitsets
Some of the indexes must be btrees, ex:
frequency_capping --> ad_id
Only use cheap read-transactions from local ads db
Use maxmind c extension for geoip
Use 51degrees c extension for device detection
Before serving the ad, do a very small write transaction to log the impression (in a separate statistics db)
Wake up every x seconds to synchronize the logged impressions from local db with master db
Using lmdb with no durability, if the server crashes the data is lost
Adding more durability slows things down because disk-access is slow
Extra: Have another process that does a group-commit to hdd (ex: every say 0.5 seconds)
Cache – key-value memory-mapped cache.
Used to cache zones and other non transactional data.
Increments used to keep statistics on each server.
Mules – Managed python processes that do not serve http-requests but can be contacted from workers
Programmed Mules - Managed python processes that just do an infinite loop
Used to run the Local Db Updater + Statistics Process
Http – Async http server. Less features and ~10% slower than nginx, but enough.
@cron() decorator, cron-like, but for python functions.
Geoip module: The webserver does the geoip.
Most of the modules:
are actually wrappers of c/c++ extensions. If the app has immense success, and a c/c++ port will become inevitable, it will be easier to make the initial translation *
* maybe.
Questions ?
https://github.com/ddorian
dorian.hoxha@gmail.com