Popular Database ORM Libraries

ChiPy ~ Tanya Schlusser ~ September 2016

Object Relational Mapping

Python objects

Relational Database

mapping

ORM libraries provide a way to map between class instances in an object-oriented language and the content of a database.

Ways to do ORM

Patterns of Enterprise Architecture, by Martin Fowler, is the de facto reference for data engineers.

  • All of the ORM architectures in this talk are modeled on stuff in his book (2002)*.
     
  • He's famous. Visit his webpage if his name is new to you. It's good.

* Michael Bayer, in an article about how he designed SQLAlchemy, said he modeled the design after Fowler's DataMapper architecture.

The backend

The Data Warehouse Toolkit series, by Ralph Kimball and collaborators, is the de facto reference for data architects.

 

(I was given this book while at an IT consulting firm in 2013.)

Data Engineer: Best at designing the database-to-application interface.

Data Architect: Best at architecting the database tables and their interconnections.

Typical database design breaks apart tables to eliminate duplication of unique data.

Don't Repeat Yourself

(DRY)

(database normalization)

Example: storing a receipt

ID Name
1 John Cleese
Receipt          11/30/1972
John Cleese               19:00

 

Wensleydale    1 @ 2.00£  2.00£
Cheddar        2 @ 1.50£  3.00£
R. Leicester   1 @ 1.00£  1.00£
                          6.00£
ID Type
1 Cheddar
2 Wensleydale
3 Red Leicester
ID Receipt Cheese Qty Price_ea
1 1 1 2 1.50
2 1 2 1 2.00
3 1 3 1 1.00
ID Date_time Customer
1 11/30/1972 19:00 1

Customer

Cheese

Line_Item

Receipt

One-to-many relations

Customer
id
name
Cheese
id
name
Line_Item
id
receipt_id
cheese_id
qty
price_ea
Receipt
id
date_time
customer_id

One cheese type can be on many line items.

One customer can make many purchases.

One receipt can have many line items.

Many-to-many relations

Customer
id
name
Cheese
id
name
Line_Item
id
receipt_id
cheese_id
qty
price_ea
Receipt
id
date_time
customer_id

...and many customers can purchase the same cheese many times.

The same type of cheese can appear on many receipts...

(You need multiple joins.)

Compare ORM APIs

See the difference by performing a many-to-many join in each API

* Other considerations are: which backends they support, whether they have change management built in (Django), and whether they can do database introspection (SQLAlchemy).

Many-to-many join

import sqlite3

query = """
SELECT
    cheese.name as cheese,
    sum(qty) as purchases
FROM line_item AS li
    JOIN receipt ON li.receipt_id = receipt.id
    JOIN cheese ON li.cheese_id = cheese.id
GROUP BY cheese.id
ORDER BY purchases DESC
"""

conn = sqlite3.connect('sqlite3.db')
c = conn.cursor()

for row in c.execute(query):
    print row

The output:

(u'Cheddar', 3)
(u'Wensleydale', 3)
(u'Red Leicester', 2)
(u'Camembert', 1)
  • Typos in the SQL (hard to debug in Python)
  • Must remember column name order in the SQL output
  • Must implement business logic (like set memberships) by hand

Caveats:

Object/Relational Mapping solves some of these problems

sqlite3 (Yay PEP 249!)

We'll use sqlite3 because

  • It's in Python's Standard Library
  • It can support a site with ~100k hits/day (says their site)
  • The database API it follows is the Python standard (PEP 249)

Also called a database driver; most ORM libraries let you choose from a set of drivers: at least for PostgreSQL, MySQL, and SQLite.

Active Record

* Images are from Martin Fowler's webpage

The Python object contains a row of data and can directly interact with the database.

 

(so it's an Active Record)

Examples

  • Ruby on Rails ORM*
  • Django ORM
  • Pony ORM
  • Peewee

* Not Python, but the reason ActiveRecord is so popular right now.

Django ORM

Django's ORM is integrated with its web framework and can't be installed separately. To use it without a web app, copy Mick Thompson's repo and start with:

import os
import django

# You define the settings.py module
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")
django.setup()
from django.conf import settings

# You define the contents of the orm_only app
from django.db import models
from orm_only.models import *

Django ORM

Define the tables in models.py like this:

from django.db import models

class Cheese(models.Model):
    name = models.CharField(max_length=30)

class Customer(models.Model):
    name = models.CharField(max_length=50)

class Purchase(models.Model):
    purchase_date = models.DateField()
    customer = models.ForeignKey(Customer)
    cheeses = models.ManyToManyField(Cheese)

Django ORM

Create objects:

leicester = Cheese.objects.create(name='Red Leicester')
leicester.save()

doug = Customer.objects.create(name='Douglas')
doug.save()
 
# dates is a vector of time constants
p = Purchase(purchase_date=dates[0], customer=doug)
p.save()
p.cheeses.add(camembert, leicester)

Django ORM

Query:

# Count the number of purchases by cheese
from django.db.models import Count
sales_counts = (
    queryset.values('cheeses__name')
    .annotate(total=Count('cheeses'))
    .order_by('cheeses__name')
)

for sc in sales_counts:
    print(sc)

Django ORM

The tables in the SQLite database:

import sqlite3

con = sqlite3.connect('try_django.db')
result = con.execute("""
    SELECT name FROM sqlite_master WHERE type='table';
    """).fetchall()

for r in result:
    print(r)

# ('django_migrations',)
# ('sqlite_sequence',)
# ('orm_only_cheese',)
# ('orm_only_customer',)
# ('orm_only_purchase',)
# ('orm_only_purchase_cheeses',)

Implicitly created table (for the many-to-many relation)

Pony ORM

Pony ORM's main attraction is its generator-style query syntax. Touch the filename for the database, then:

import os

from pony import orm

cwd = os.getcwd()
db = orm.Database()
db.bind('sqlite', os.path.join(cwd, 'try_pony.db'))

Pony ORM

Create tables:

class Cheese(db.Entity):
    type = orm.Required(str)
    purchases = orm.Set(lambda: Purchase)

class Customer(db.Entity):
    name = orm.Required(str)
    purchases = orm.Set(lambda: Purchase)

class Purchase(db.Entity):
    purchase_date = orm.Required(datetime.date)
    customer = orm.Required(Customer)
    cheeses = orm.Set(Cheese)

db.generate_mapping(create_tables=True)

Pony ORM

Insert data:

leicester = Cheese(type='Red Leicester')

doug = Customer(name='Douglas')

# time constants
d = datetime.datetime(1971, 12, 18, 20)
day = datetime.timedelta(1)
dates = [d - pct * day for pct in (1, 1.5, 0.01)]
Purchase(
    purchase_date=dates[0],
    customer=doug,
    cheeses={camembert, leicester}
)

orm.commit()

Pony ORM

Query:

for chz in orm.select(
        p.cheeses for p in Purchase
        if p.purchase_date > d - 1.1 * day):
    print(chz.type)

cheddar = Cheese(type='Cheddar')
orm.commit()

for chz, purchases in orm.left_join(
        (c, orm.count(p)) 
        for c in Cheese
        for p in c.purchases):
    print(chz.type, purchases)

# Prints:
# Red Leicester 2
# Wensleydale 1
# Camembert 2
# Cheddar 0

Pony ORM

The tables in the SQLite database:

import sqlite3
con = sqlite3.connect('try_pony.db')
result = con.execute("""
    SELECT name FROM sqlite_master WHERE type='table';
    """).fetchall()

for r in result:
    print(r)

# Prints:
# ('Cheese',)
# ('sqlite_sequence',)
# ('Customer',)
# ('Purchase',)
# ('Cheese_Purchase',)

Peewee

Peewee is intended to be similar to SQLAlchemy so we're holding off for a sec.

Data Mapper

* Images are from Martin Fowler's webpage

An intermediate object performs the mapping between the Python object and the database.

Examples

  • SQLAlchemy
  • Maybe peewee

SQLAlchemy

SQLAlchemy was created by a database administrator. The ORM-style features were added later because of the popularity of Django ORM and other libraries.

SQLAlchemy

Create tables like this:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Date, Integer, String
from sqlalchemy import Table, ForeignKey
from sqlalchemy.orm import relationship

Base = declarative_base()

# This table implements the many-to-many mapping
purchases_cheeses = Table(
    'purchases_cheeses',
    Base.metadata,
    Column(
        'purch_id',
        Integer,
        ForeignKey('purchases.id', primary_key=True)),
    Column(
        'cheese_id',
        Integer,
        ForeignKey('cheeses.id', primary_key=True))
)

SQLAlchemy

Create tables (the top-level ones) like this:

class Customer(Base):
    __tablename__ = 'customers'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    def __repr__(self):
       return "<Customer(name='%s')>" % (self.name)

class Purchase(Base):
    __tablename__ = 'purchases'
    id = Column(Integer, primary_key=True)
    customer_id = Column(
        Integer, ForeignKey('customers.id', primary_key=True))
    purchase_date = Column(Date, nullable=False)
    customer = relationship('Customer')
    cheeses = relationship(
        'Cheese',
        secondary='purchases_cheeses',
        back_populates='purchases')
    def __repr__(self):
       return "<Purchase(customer='%s', dt='%s')>" % (
              self.customer.name, self.purchase_date)

SQLAlchemy

And to actually create them in the database:

from sqlalchemy import create_engine
engine = create_engine('sqlite://')
Base.metadata.create_all(engine)


# Like the python manage.py migrate
# in Django...

SQLAlchemy

To query / insert:

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
sess = Session()

leicester = Cheese(kind='Red Leicester')
camembert = Cheese(kind='Camembert')
sess.add(camembert)
sess.add(leicester)
sess.commit()

engine.execute('SELECT * FROM cheeses;').fetchall()

doug = Customer(name='Douglas')
cat = Customer(name='Cat')
sess.add_all((doug, cat))
sess.query(Customer).all()

SQLAlchemy

To insert:

from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
sess = Session()

camembert = Cheese(kind='Camembert')
sess.add(leicester)
sess.commit()

doug = Customer(name='Douglas')
sess.commit()

# d is a datetime object
p1 = Purchase(purchase_date=(d - 1 * day), customer=doug)
p1.cheeses.append(camembert)
p1.cheeses.append(leicester)
sess.add(p1)
sess.commit()

SQLAlchemy

To query:

sess.query(Customer).all()

from sqlalchemy import func
sess.query(Purchase).filter(Purchase.purchase_date > (d - 1 * day)).all()

( sess.query(Cheese, Purchase)
    .filter(Purchase.purchase_date > (d - 1 * day))
    .from_self(Cheese.kind, func.count(Purchase.id))
    .group_by(Cheese.kind)
).all()

for row in sess.query(Purchase,Cheese).filter(Purchase.cheeses):
    print(row)

(sess.query(Purchase,Cheese)
    .filter(Purchase.cheeses)
    .from_self(Cheese.kind, func.count(Purchase.id))
    .group_by(Cheese.kind)
).all()

SQLAlchemy

And the SQLite database looks like:


result = engine.execute("""
     SELECT name FROM sqlite_master WHERE type='table';
     """).fetchall()
 
for r in result:
    print(r)


# Prints
# ('cheeses',)
# ('customers',)
# ('purchases',)
# ('purchases_cheeses',)

Peewee

Peewee was created by a SQLAlchemy admirer who thought that it was overkill for his tiny web page. The idea is there isn't any magical table creation behind-the-scenes, except for adding an index column if you don't.

 

You implement the many-to-many join manually.

Peewee

To create tables:

import peewee
database = peewee.SqliteDatabase('peewee.db')

class BaseModel(peewee.Model):
    class Meta:
        database = database

class Customer(BaseModel):
    name = peewee.TextField()

class Purchase(BaseModel):
    purchase_date = peewee.DateField()
    customer = peewee.ForeignKeyField(Customer, related_name='purchases')

class PurchaseCheese(BaseModel):
    """A simple "through" table for many-to-many relationship."""
    purchase = peewee.ForeignKeyField(Purchase)
    cheese = peewee.ForeignKeyField(Cheese)
    
database.create_tables((Customer, Purchase, Cheese, PurchaseCheese))

Peewee

To insert data:

leicester = Cheese.create(kind='Red Leicester')
camembert = Cheese.create(kind='Camembert')
doug = Customer.create(name='Douglas')
cat = Customer.create(name='Cat')

import datetime
d = datetime.date(1971, 12, 18)
day = datetime.timedelta(1)

p1 = Purchase.create(purchase_date=d, customer=doug)
PurchaseCheese.create(purchase=p1, cheese=camembert)
PurchaseCheese.create(purchase=p1, cheese=leicester)

p2 = Purchase.create(purchase_date=d+day, customer=cat)
PurchaseCheese.create(purchase=p2, cheese=camembert)

Peewee

To query:

for p in Purchase.select().where(
        Purchase.purchase_date > d - 1 * day):
    print(p.customer.name, p.purchase_date)

from peewee import fn
q = (Cheese
    .select(
        Cheese.kind,
        fn.COUNT(Purchase.id).alias('num_purchased')
    )
    .join(PurchaseCheese)
    .join(Purchase)
    .group_by(Cheese.kind)
 )
for chz in q:
    print(chz.kind, chz.num_purchased)

Peewee

And the SQLite database looks like:

import sqlite3
con = sqlite3.connect('peewee.db')
result = con.execute("""
    SELECT name FROM sqlite_master WHERE type='table';
    """).fetchall()

for r in result:
    print(r)


# Prints
# ('cheese',)
# ('customer',)
# ('purchase',)
# ('purchasecheese',)

Peewee

To insert data:

import peewee
database = peewee.SqliteDatabase('peewee.db')

class BaseModel(peewee.Model):
    class Meta:
        database = database

class Customer(BaseModel):
    name = peewee.TextField()

class Purchase(BaseModel):
    purchase_date = peewee.DateField()
    customer = peewee.ForeignKeyField(Customer, related_name='purchases')

class Cheese(BaseModel):
    kind = peewee.TextField()

class PurchaseCheese(BaseModel):
    """A simple "through" table for many-to-many relationship."""
    purchase = peewee.ForeignKeyField(Purchase)
    cheese = peewee.ForeignKeyField(Cheese)
    
database.create_tables((Customer, Purchase, Cheese, PurchaseCheese))

leicester = Cheese.create(kind='Red Leicester')
camembert = Cheese.create(kind='Camembert')
doug = Customer.create(name='Douglas')
cat = Customer.create(name='Cat')

import datetime
d = datetime.date(1971, 12, 18)
day = datetime.timedelta(1)

p1 = Purchase.create(purchase_date=d, customer=doug)
PurchaseCheese.create(purchase=p1, cheese=camembert)
PurchaseCheese.create(purchase=p1, cheese=leicester)

p2 = Purchase.create(purchase_date=d+day, customer=cat)
PurchaseCheese.create(purchase=p2, cheese=camembert)

Also try

pyDAL

  • Locally invented (by Massimo)
  • No specific ORM classes
  • Nice abstract query language
  • I'll put it on the GitHub repo
  • Out next week
     
  • Same as online, with all missing sections filled in
     
  • (including a section on ORMs that is a lot like this)
     
  • All royalties to Django Girls
Made with Slides.com