SQLAlchemy
beyond ORM
Me!
- @willengler
- I write Python all day at the University of Chicago
- Lead engineer of plenar.io
What is SQLAlchemy?
- "The database toolkit for Python"
- Introduced in 2005
What I'm Going to Talk About
- SQLAlchemy's layers: ORM and Core
- Why SQLAlchemy's ORM feels funny
Peeling the Onion
ORM and Core Layers
Wait, What's an ORM?
from django.db import models
class Fruit(models.Model):
name = models.CharField(max_length=100, primary_key=True)
>>> fruit = Fruit.objects.create(name='Apple')
>>> fruit.name = 'Pear'
>>> fruit.save()
>>> Fruit.objects.values_list('name', flat=True)
['Apple', 'Pear']
ORM == Object Relational Mapper
Broadly speaking, a tool that associates object oriented classes with database tables.
Layers of SQLAlchemy
Layers of SQLAlchemy
Engine and Connection
- Engines handle connection pooling and your "dialect"
- Then you can grab connections from your engine
Metadata and Reflection
- Create Table objects with associated Columns
- Metadata is a registry for Tables
- You can use Metadata to reflect Tables from your DB
SQL Expressions
- SQL expressions are represented as objects
- Compose expressions to form statements
- Execute statements with your engine
In the Wild
A Monster
# Create a CTE to represent every time bucket in the timeseries
# with a default count of 0
day_generator = func.generate_series(func.date_trunc(agg_unit, start),
func.date_trunc(agg_unit, end),
'1 ' + agg_unit)
defaults = select([sa.literal_column("0").label('count'),
day_generator.label('time_bucket')])\
.alias('defaults')
# Create a CTE that grabs the number of records contained in each time bucket.
# Will only have rows for buckets with records.
actuals = select([func.count(t.c.point_id).label('count'), # Count unique records
func.date_trunc(agg_unit, t.c.point_date).label('time_bucket')])\
.where(sa.and_(t.c.point_date >= start, # Only include records in time window
t.c.point_date <= end))\
.group_by('time_bucket')
# Also filter by geometry if requested
if geom:
actuals = actuals.where(t.c.geom.ST_Within(func.ST_GeomFromGeoJSON(geom)))
# Need to alias to make it usable in a subexpression
actuals = actuals.alias('actuals')
# Outer join the default and observed values to create the timeseries select statement.
# If no observed value in a bucket, use the default.
ts = select([sa.literal_column("'{}'".format(self.dataset_name)).label('dataset_name'),
defaults.c.time_bucket.label('time_bucket'),
func.coalesce(actuals.c.count, defaults.c.count).label('count')]).\
select_from(defaults.outerjoin(actuals, actuals.c.time_bucket == defaults.c.time_bucket))
on Power
a digression
Why SQLAlchemy's ORM Feels Funny
Data Mapper vs. Active Record
Active Record
- Domain objects handle their own persistence.
- Seen in:
- Rails
- Django
- Laravel
Data Mapper
- Persistence is separate from domain objects.
- Seen in:
- SQLAlchemy
- Hibernate
Flavors of ORM
Active Record
from blog.models import Blog
b = Blog(name="Will's Cool Blog")
b.save()
Data Mapper
from my_models import User
# Session is a SQLAlchemy session factory
session = Session()
ed_user = User(name='ed')
session.add(ed_user)
session.commit()
session is a Unit of Work
- Keeps track of what's been modified in a transaction
- When you're ready to commit(), persists changes to DB in the right order
Using the Lower Layers
- Attributes of a mapped object are the same primitives we used when constructing our own expressions
- We can mix in the lower-lever expression API as needed
- Flexible, but can be confusing
Credits and Resources
Thanks!
- @willengler
- U Chicago Urban Center for Computation and Data is hiring a data systems engineer. Talk to me!
SQLAlchemy
By Will Engler
SQLAlchemy
ChiPy 12/9/15
- 820