ChiPy ~ Tanya Schlusser ~ September 2016
Python objects
Relational Database
mapping
ORM libraries provide a way to map between class instances in an object-oriented language and the content of a database.
Patterns of Enterprise Architecture, by Martin Fowler, is the de facto reference for data engineers.
* Michael Bayer, in an article about how he designed SQLAlchemy, said he modeled the design after Fowler's DataMapper architecture.
The Data Warehouse Toolkit series, by Ralph Kimball and collaborators, is the de facto reference for data architects.
(I was given this book while at an IT consulting firm in 2013.)
Data Engineer: Best at designing the database-to-application interface.
Data Architect: Best at architecting the database tables and their interconnections.
(database normalization)
ID | Name |
---|---|
1 | John Cleese |
Receipt 11/30/1972 John Cleese 19:00
Wensleydale 1 @ 2.00£ 2.00£
Cheddar 2 @ 1.50£ 3.00£
R. Leicester 1 @ 1.00£ 1.00£
6.00£
ID | Type |
---|---|
1 | Cheddar |
2 | Wensleydale |
3 | Red Leicester |
ID | Receipt | Cheese | Qty | Price_ea |
---|---|---|---|---|
1 | 1 | 1 | 2 | 1.50 |
2 | 1 | 2 | 1 | 2.00 |
3 | 1 | 3 | 1 | 1.00 |
ID | Date_time | Customer |
---|---|---|
1 | 11/30/1972 19:00 | 1 |
Customer
Cheese
Line_Item
Receipt
Customer |
---|
id name |
Cheese |
---|
id name |
Line_Item |
---|
id receipt_id cheese_id qty price_ea |
Receipt |
---|
id date_time customer_id |
One cheese type can be on many line items.
One customer can make many purchases.
One receipt can have many line items.
Customer |
---|
id name |
Cheese |
---|
id name |
Line_Item |
---|
id receipt_id cheese_id qty price_ea |
Receipt |
---|
id date_time customer_id |
...and many customers can purchase the same cheese many times.
The same type of cheese can appear on many receipts...
(You need multiple joins.)
* Other considerations are: which backends they support, whether they have change management built in (Django), and whether they can do database introspection (SQLAlchemy).
import sqlite3
query = """
SELECT
cheese.name as cheese,
sum(qty) as purchases
FROM line_item AS li
JOIN receipt ON li.receipt_id = receipt.id
JOIN cheese ON li.cheese_id = cheese.id
GROUP BY cheese.id
ORDER BY purchases DESC
"""
conn = sqlite3.connect('sqlite3.db')
c = conn.cursor()
for row in c.execute(query):
print row
(u'Cheddar', 3)
(u'Wensleydale', 3)
(u'Red Leicester', 2)
(u'Camembert', 1)
We'll use sqlite3 because
Also called a database driver; most ORM libraries let you choose from a set of drivers: at least for PostgreSQL, MySQL, and SQLite.
* Images are from Martin Fowler's webpage
The Python object contains a row of data and can directly interact with the database.
(so it's an Active Record)
* Not Python, but the reason ActiveRecord is so popular right now.
Django's ORM is integrated with its web framework and can't be installed separately. To use it without a web app, copy Mick Thompson's repo and start with:
import os
import django
# You define the settings.py module
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")
django.setup()
from django.conf import settings
# You define the contents of the orm_only app
from django.db import models
from orm_only.models import *
Define the tables in models.py like this:
from django.db import models
class Cheese(models.Model):
name = models.CharField(max_length=30)
class Customer(models.Model):
name = models.CharField(max_length=50)
class Purchase(models.Model):
purchase_date = models.DateField()
customer = models.ForeignKey(Customer)
cheeses = models.ManyToManyField(Cheese)
Create objects:
leicester = Cheese.objects.create(name='Red Leicester')
leicester.save()
doug = Customer.objects.create(name='Douglas')
doug.save()
# dates is a vector of time constants
p = Purchase(purchase_date=dates[0], customer=doug)
p.save()
p.cheeses.add(camembert, leicester)
Query:
# Count the number of purchases by cheese
from django.db.models import Count
sales_counts = (
queryset.values('cheeses__name')
.annotate(total=Count('cheeses'))
.order_by('cheeses__name')
)
for sc in sales_counts:
print(sc)
The tables in the SQLite database:
import sqlite3
con = sqlite3.connect('try_django.db')
result = con.execute("""
SELECT name FROM sqlite_master WHERE type='table';
""").fetchall()
for r in result:
print(r)
# ('django_migrations',)
# ('sqlite_sequence',)
# ('orm_only_cheese',)
# ('orm_only_customer',)
# ('orm_only_purchase',)
# ('orm_only_purchase_cheeses',)
Implicitly created table (for the many-to-many relation)
Pony ORM's main attraction is its generator-style query syntax. Touch the filename for the database, then:
import os
from pony import orm
cwd = os.getcwd()
db = orm.Database()
db.bind('sqlite', os.path.join(cwd, 'try_pony.db'))
Create tables:
class Cheese(db.Entity):
type = orm.Required(str)
purchases = orm.Set(lambda: Purchase)
class Customer(db.Entity):
name = orm.Required(str)
purchases = orm.Set(lambda: Purchase)
class Purchase(db.Entity):
purchase_date = orm.Required(datetime.date)
customer = orm.Required(Customer)
cheeses = orm.Set(Cheese)
db.generate_mapping(create_tables=True)
Insert data:
leicester = Cheese(type='Red Leicester')
doug = Customer(name='Douglas')
# time constants
d = datetime.datetime(1971, 12, 18, 20)
day = datetime.timedelta(1)
dates = [d - pct * day for pct in (1, 1.5, 0.01)]
Purchase(
purchase_date=dates[0],
customer=doug,
cheeses={camembert, leicester}
)
orm.commit()
Query:
for chz in orm.select(
p.cheeses for p in Purchase
if p.purchase_date > d - 1.1 * day):
print(chz.type)
cheddar = Cheese(type='Cheddar')
orm.commit()
for chz, purchases in orm.left_join(
(c, orm.count(p))
for c in Cheese
for p in c.purchases):
print(chz.type, purchases)
# Prints:
# Red Leicester 2
# Wensleydale 1
# Camembert 2
# Cheddar 0
The tables in the SQLite database:
import sqlite3
con = sqlite3.connect('try_pony.db')
result = con.execute("""
SELECT name FROM sqlite_master WHERE type='table';
""").fetchall()
for r in result:
print(r)
# Prints:
# ('Cheese',)
# ('sqlite_sequence',)
# ('Customer',)
# ('Purchase',)
# ('Cheese_Purchase',)
Peewee is intended to be similar to SQLAlchemy so we're holding off for a sec.
* Images are from Martin Fowler's webpage
An intermediate object performs the mapping between the Python object and the database.
SQLAlchemy was created by a database administrator. The ORM-style features were added later because of the popularity of Django ORM and other libraries.
Create tables like this:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Date, Integer, String
from sqlalchemy import Table, ForeignKey
from sqlalchemy.orm import relationship
Base = declarative_base()
# This table implements the many-to-many mapping
purchases_cheeses = Table(
'purchases_cheeses',
Base.metadata,
Column(
'purch_id',
Integer,
ForeignKey('purchases.id', primary_key=True)),
Column(
'cheese_id',
Integer,
ForeignKey('cheeses.id', primary_key=True))
)
Create tables (the top-level ones) like this:
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False)
def __repr__(self):
return "<Customer(name='%s')>" % (self.name)
class Purchase(Base):
__tablename__ = 'purchases'
id = Column(Integer, primary_key=True)
customer_id = Column(
Integer, ForeignKey('customers.id', primary_key=True))
purchase_date = Column(Date, nullable=False)
customer = relationship('Customer')
cheeses = relationship(
'Cheese',
secondary='purchases_cheeses',
back_populates='purchases')
def __repr__(self):
return "<Purchase(customer='%s', dt='%s')>" % (
self.customer.name, self.purchase_date)
And to actually create them in the database:
from sqlalchemy import create_engine
engine = create_engine('sqlite://')
Base.metadata.create_all(engine)
# Like the python manage.py migrate
# in Django...
To query / insert:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
sess = Session()
leicester = Cheese(kind='Red Leicester')
camembert = Cheese(kind='Camembert')
sess.add(camembert)
sess.add(leicester)
sess.commit()
engine.execute('SELECT * FROM cheeses;').fetchall()
doug = Customer(name='Douglas')
cat = Customer(name='Cat')
sess.add_all((doug, cat))
sess.query(Customer).all()
To insert:
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
sess = Session()
camembert = Cheese(kind='Camembert')
sess.add(leicester)
sess.commit()
doug = Customer(name='Douglas')
sess.commit()
# d is a datetime object
p1 = Purchase(purchase_date=(d - 1 * day), customer=doug)
p1.cheeses.append(camembert)
p1.cheeses.append(leicester)
sess.add(p1)
sess.commit()
To query:
sess.query(Customer).all()
from sqlalchemy import func
sess.query(Purchase).filter(Purchase.purchase_date > (d - 1 * day)).all()
( sess.query(Cheese, Purchase)
.filter(Purchase.purchase_date > (d - 1 * day))
.from_self(Cheese.kind, func.count(Purchase.id))
.group_by(Cheese.kind)
).all()
for row in sess.query(Purchase,Cheese).filter(Purchase.cheeses):
print(row)
(sess.query(Purchase,Cheese)
.filter(Purchase.cheeses)
.from_self(Cheese.kind, func.count(Purchase.id))
.group_by(Cheese.kind)
).all()
And the SQLite database looks like:
result = engine.execute("""
SELECT name FROM sqlite_master WHERE type='table';
""").fetchall()
for r in result:
print(r)
# Prints
# ('cheeses',)
# ('customers',)
# ('purchases',)
# ('purchases_cheeses',)
Peewee was created by a SQLAlchemy admirer who thought that it was overkill for his tiny web page. The idea is there isn't any magical table creation behind-the-scenes, except for adding an index column if you don't.
You implement the many-to-many join manually.
To create tables:
import peewee
database = peewee.SqliteDatabase('peewee.db')
class BaseModel(peewee.Model):
class Meta:
database = database
class Customer(BaseModel):
name = peewee.TextField()
class Purchase(BaseModel):
purchase_date = peewee.DateField()
customer = peewee.ForeignKeyField(Customer, related_name='purchases')
class PurchaseCheese(BaseModel):
"""A simple "through" table for many-to-many relationship."""
purchase = peewee.ForeignKeyField(Purchase)
cheese = peewee.ForeignKeyField(Cheese)
database.create_tables((Customer, Purchase, Cheese, PurchaseCheese))
To insert data:
leicester = Cheese.create(kind='Red Leicester')
camembert = Cheese.create(kind='Camembert')
doug = Customer.create(name='Douglas')
cat = Customer.create(name='Cat')
import datetime
d = datetime.date(1971, 12, 18)
day = datetime.timedelta(1)
p1 = Purchase.create(purchase_date=d, customer=doug)
PurchaseCheese.create(purchase=p1, cheese=camembert)
PurchaseCheese.create(purchase=p1, cheese=leicester)
p2 = Purchase.create(purchase_date=d+day, customer=cat)
PurchaseCheese.create(purchase=p2, cheese=camembert)
To query:
for p in Purchase.select().where(
Purchase.purchase_date > d - 1 * day):
print(p.customer.name, p.purchase_date)
from peewee import fn
q = (Cheese
.select(
Cheese.kind,
fn.COUNT(Purchase.id).alias('num_purchased')
)
.join(PurchaseCheese)
.join(Purchase)
.group_by(Cheese.kind)
)
for chz in q:
print(chz.kind, chz.num_purchased)
And the SQLite database looks like:
import sqlite3
con = sqlite3.connect('peewee.db')
result = con.execute("""
SELECT name FROM sqlite_master WHERE type='table';
""").fetchall()
for r in result:
print(r)
# Prints
# ('cheese',)
# ('customer',)
# ('purchase',)
# ('purchasecheese',)
To insert data:
import peewee
database = peewee.SqliteDatabase('peewee.db')
class BaseModel(peewee.Model):
class Meta:
database = database
class Customer(BaseModel):
name = peewee.TextField()
class Purchase(BaseModel):
purchase_date = peewee.DateField()
customer = peewee.ForeignKeyField(Customer, related_name='purchases')
class Cheese(BaseModel):
kind = peewee.TextField()
class PurchaseCheese(BaseModel):
"""A simple "through" table for many-to-many relationship."""
purchase = peewee.ForeignKeyField(Purchase)
cheese = peewee.ForeignKeyField(Cheese)
database.create_tables((Customer, Purchase, Cheese, PurchaseCheese))
leicester = Cheese.create(kind='Red Leicester')
camembert = Cheese.create(kind='Camembert')
doug = Customer.create(name='Douglas')
cat = Customer.create(name='Cat')
import datetime
d = datetime.date(1971, 12, 18)
day = datetime.timedelta(1)
p1 = Purchase.create(purchase_date=d, customer=doug)
PurchaseCheese.create(purchase=p1, cheese=camembert)
PurchaseCheese.create(purchase=p1, cheese=leicester)
p2 = Purchase.create(purchase_date=d+day, customer=cat)
PurchaseCheese.create(purchase=p2, cheese=camembert)