SQLAlchemy

beyond ORM

Me!

  • @willengler
  • I write Python all day at the University of Chicago
  • Lead engineer of plenar.io

What is SQLAlchemy?

  • "The database toolkit for Python"
  • Introduced in 2005

What I'm Going to Talk About

  • SQLAlchemy's layers: ORM and Core
  • Why SQLAlchemy's ORM feels funny

Peeling the Onion

ORM and Core Layers

Wait, What's an ORM?

from django.db import models

class Fruit(models.Model):
    name = models.CharField(max_length=100, primary_key=True)

>>> fruit = Fruit.objects.create(name='Apple')
>>> fruit.name = 'Pear'
>>> fruit.save()
>>> Fruit.objects.values_list('name', flat=True)
['Apple', 'Pear']

ORM == Object Relational Mapper

 

Broadly speaking, a tool that associates object oriented classes with database tables.

Layers of SQLAlchemy

Layers of SQLAlchemy

Engine and Connection

  • Engines handle connection pooling and your "dialect"
  • Then you can grab connections from your engine

Metadata and Reflection

  • Create Table objects with associated Columns
  • Metadata is a registry for Tables
  • You can use Metadata to reflect Tables from your DB

SQL Expressions

  • SQL expressions are represented as objects
  • Compose expressions to form statements
  • Execute statements with your engine

In the Wild

A Monster

# Create a CTE to represent every time bucket in the timeseries
# with a default count of 0
day_generator = func.generate_series(func.date_trunc(agg_unit, start),
                                     func.date_trunc(agg_unit, end),
                                     '1 ' + agg_unit)
defaults = select([sa.literal_column("0").label('count'),
                   day_generator.label('time_bucket')])\
    .alias('defaults')

# Create a CTE that grabs the number of records contained in each time bucket.
# Will only have rows for buckets with records.
actuals = select([func.count(t.c.point_id).label('count'),  # Count unique records
                  func.date_trunc(agg_unit, t.c.point_date).label('time_bucket')])\
    .where(sa.and_(t.c.point_date >= start,            # Only include records in time window
                   t.c.point_date <= end))\
    .group_by('time_bucket')

# Also filter by geometry if requested
if geom:
    actuals = actuals.where(t.c.geom.ST_Within(func.ST_GeomFromGeoJSON(geom)))

# Need to alias to make it usable in a subexpression
actuals = actuals.alias('actuals')

# Outer join the default and observed values to create the timeseries select statement.
# If no observed value in a bucket, use the default.
ts = select([sa.literal_column("'{}'".format(self.dataset_name)).label('dataset_name'),
             defaults.c.time_bucket.label('time_bucket'),
             func.coalesce(actuals.c.count, defaults.c.count).label('count')]).\
    select_from(defaults.outerjoin(actuals, actuals.c.time_bucket == defaults.c.time_bucket))

on Power

a digression

Why SQLAlchemy's ORM Feels Funny

Data Mapper vs. Active Record

Active Record

  • Domain objects handle their own persistence.
  • Seen in:
    • Rails
    • Django
    • Laravel

Data Mapper

  • Persistence is separate from domain objects.
  • Seen in:
    • SQLAlchemy
    • Hibernate

Flavors of ORM

Active Record

from blog.models import Blog
b = Blog(name="Will's Cool Blog")
b.save()

Data Mapper

from my_models import User

# Session is a SQLAlchemy session factory
session = Session()

ed_user = User(name='ed')
session.add(ed_user)
session.commit()

session is a Unit of Work

  • Keeps track of what's been modified in a transaction
  • When you're ready to commit(), persists changes to DB in the right order

Using the Lower Layers

  • Attributes of a mapped object are the same primitives we used when constructing our own expressions
  • We can mix in the lower-lever expression API as needed
  • Flexible, but can be confusing

Credits and Resources

Thanks!

  • @willengler
  • U Chicago Urban Center for Computation and Data is hiring a data systems engineer. Talk to me!

SQLAlchemy

By Will Engler

SQLAlchemy

ChiPy 12/9/15

  • 820