ORM Fine Tuning
Kalil de Lima
Rootstrap Week 2019

kalil@rootstrap.com
github.com/kaozdl

About Me
About Me
Backend developer
About Me
Backend developer
#hay-cafe
About Me
Backend developer
#hay-cafe
#open-source
About Me
Backend developer
#hay-cafe
#open-source
Hobbies
#python
About Me
Backend developer
#hay-cafe
#open-source
Hobbies
Video Games
#python
About Me
Backend developer
#hay-cafe
#open-source
Hobbies
Video Games
Martial arts
#python
About Me
Backend developer
#hay-cafe
#open-source
Hobbies
Video Games
Martial arts
Programming
#python
About Me
Backend developer
#hay-cafe
#open-source
Hobbies
Video Games
Martial arts
Programming
Especially clever hacks
#python
What is an ORM?

SQL
Python
SELECT a.nombre_autor
FROM obra_autor oa
NATURAL JOIN obras
INNER JOIN autores a on oa.cod_autor = a.cod_autor
NATURAL JOIN obra_editorial oe
WHERE 1 < ALL
(SELECT COUNT(DISTINCT oe.cod_editorial)
FROM obra_autor oa
NATURAL JOIN obra_editorial oe
WHERE oa.cod_autor = a.cod_autor
GROUP BY oa.cod_obra)
GROUP BY a.cod_autor,
a.nombre_autor
ORDER BY a.nombre_autor;
q1 = Autores.objects.filter(
obras__cod_obra__in=Obras.objects.annotate(
cant_editoriales=Count('publicado_por')
).filter(
cant_editoriales__gt=1
).values_list(
'cod_obra',
flat=True)
).values_list(
'nombre_autor',
flat=True)
q2 = Autores.objects.filter(
obras__cod_obra__in=Obras.objects.annotate(
cant_editoriales=Count('publicado_por')
).filter(
cant_editoriales=1
).values_list(
'cod_obra',
flat=True)
).values_list(
'nombre_autor',
flat=True)
query3 = q1.difference(q2)
SQL
Table "public.obras"
Column | Type | Collation | Nullable | Default
-------------+------------------------+-----------+----------+---------
cod_obra | integer | | not null |
titulo | character varying(500) | | |
ubicacion | character varying(30) | | |
edicion | character varying(100) | | |
pais | character(3) | | |
fecha | character(10) | | |
paginacion | character(20) | | |
serie | character varying(100) | | |
notas | text | | |
isbn | character(20) | | |
cod_dewey | character(25) | | |
supervisada | smallint | | |
Indexes:
"obras_pkey" PRIMARY KEY, btree (cod_obra)
Foreign-key constraints:
"obras_cod_dewey_fkey" FOREIGN KEY (cod_dewey) REFERENCES dewey(cod_dewey)
"obras_pais_fkey" FOREIGN KEY (pais) REFERENCES paises(cod_pais)
Referenced by:
TABLE "inventario" CONSTRAINT "inventario_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)
TABLE "obra_autor" CONSTRAINT "obra_autor_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)
TABLE "obra_editorial" CONSTRAINT "obra_editorial_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)
TABLE "obra_tema" CONSTRAINT "obra_tema_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)Python
class Obras(models.Model):
cod_obra = models.IntegerField(primary_key=True)
titulo = models.CharField(max_length=500, blank=True, null=True)
ubicacion = models.CharField(max_length=30, blank=True, null=True)
edicion = models.CharField(max_length=100, blank=True, null=True)
pais = models.ForeignKey('Paises', models.DO_NOTHING, db_column='pais', blank=True, null=True)
fecha = models.CharField(max_length=10, blank=True, null=True)
paginacion = models.CharField(max_length=20, blank=True, null=True)
serie = models.CharField(max_length=100, blank=True, null=True)
notas = models.TextField(blank=True, null=True)
isbn = models.CharField(max_length=20, blank=True, null=True)
cod_dewey = models.ForeignKey(Dewey, models.DO_NOTHING, db_column='cod_dewey', blank=True, null=True)
supervisada = models.SmallIntegerField(blank=True, null=True)
temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')
class Meta:
managed = False
db_table = 'obras'
Main Features
- Interact with the database in one consistent language
- Useful built-in features
- Uniform DB access in all your application
Some Examples
- Django ORM - Python
- SQLAlchemy - Python
- Rails ORM - Ruby
- ROM - Ruby
- TypeORM - TypeScript
Active Record
Classes are a wrapper for a database table mapping properties in the class with columns in the table.
Each object instance is a row
Data Mapper
Classes provide create, update, delete, retrieve interfaces.
These methods provide the interface with the database.
Two Patterns
Active Record
Main design pattern used in Rails and Django.
Suitable for quick prototyping due to the standard interfaces for relatively simple objects.
Even tough is very easy to use it has some potential pitfalls that can slow down our application
The N+1 Problem
# list all the users with their purchases
info = ''
for user in User.objects.all(): # One query for retrieving all the users
info += f'Username: {user,username} \n'
for purchase in user.purchases.all():# For every user one query retrieving all purchases
info += f'|> item: {purchase.item} date: {purchase.date}}\n'
return info
# Total queries: n + 1
# 1 query for fetching n users,
# n querys for fetching purchases
info = ''
for user in User.objects.prefetch_related('purchases'): # Retrieve all users
info += f'Username: {user,username} \n' # with them purchases
for purchase in user.purchases.all(): # Purchases are now preloaded
info += f'|> item: {purchase.item} date: {purchase.date}}\n'
return info
# Total queries: 1Index Only Scans
In [1]: from biblioteca.models import Obras
In [2]: overfetch = Obras.objects.all()
In [3]: fetch = Obras.objects.only('pais','edicion','fecha')SELECT
"obras"."cod_obra",
"obras"."titulo",
"obras"."ubicacion",
"obras"."edicion",
"obras"."pais",
"obras"."fecha",
"obras"."paginacion",
"obras"."serie",
"obras"."notas",
"obras"."isbn",
"obras"."cod_dewey",
"obras"."supervisada"
FROM "obras";
SELECT
"obras"."cod_obra",
"obras"."edicion",
"obras"."pais",
"obras"."fecha"
FROM "obras";Django ORM
Fine Tuning
Indexes
class Obras(models.Model):
cod_obra = models.IntegerField(primary_key=True) #Automatic Index
titulo = models.CharField(max_length=500)
ubicacion = models.CharField(max_length=30)
edicion = models.CharField(max_length=100, db_index=True) # Manually added
fecha = models.DateField(max_length=10)
paginas = models.SmallIntegerField(max_length=20)
serie = models.CharField(max_length=100) # Manually added
notas = models.TextField(blank=True, null=True)
isbn = models.CharField(max_length=20)
supervisada = models.SmallIntegerField()
cod_dewey = models.ForeignKey(
Dewey,
models.DO_NOTHING,
db_column='cod_dewey',
blank=True,
null=True)
pais = models.ForeignKey(
'Paises',
models.DO_NOTHING,
db_column='pais',
blank=True,
null=True)
temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')
class Meta:
managed = False
db_table = 'obras'
indexes = [
models.Index(fields=['cod_obra', 'pais', 'edicion', 'fecha']),
models.Index(fields=['paginas'],condition=Q(pages__lte=200)),
]
Multi Attribute Keys
class Obras(models.Model):
cod_obra = models.IntegerField(primary_key=True) #Automatic Index
titulo = models.CharField(max_length=500)
ubicacion = models.CharField(max_length=30)
edicion = models.CharField(max_length=100, db_index=True) # Manually added
fecha = models.DateField(max_length=10)
paginas = models.SmallIntegerField(max_length=20)
serie = models.CharField(max_length=100) # Manually added
notas = models.TextField(blank=True, null=True)
isbn = models.CharField(max_length=20)
supervisada = models.SmallIntegerField()
cod_dewey = models.ForeignKey(
Dewey,
models.DO_NOTHING,
db_column='cod_dewey',
blank=True,
null=True)
pais = models.ForeignKey(
'Paises',
models.DO_NOTHING,
db_column='pais',
blank=True,
null=True)
temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')
class Meta:
db_table = 'obras'
unique_toghether = (('serie','edicion','titulo'),) # A tuple with tuples of unique together attributes
indexes = [
models.Index(fields=['serie', 'edicion']),
models.Index(fields=['paginas'],condition=Q(pages__lte=200)),
]
Ordering
class Obras(models.Model):
cod_obra = models.IntegerField(primary_key=True) #Automatic Index
titulo = models.CharField(max_length=500)
ubicacion = models.CharField(max_length=30)
edicion = models.CharField(max_length=100, db_index=True) # Manually added
fecha = models.DateField(max_length=10)
paginas = models.SmallIntegerField(max_length=20)
serie = models.CharField(max_length=100) # Manually added
notas = models.TextField(blank=True, null=True)
isbn = models.CharField(max_length=20)
supervisada = models.SmallIntegerField()
cod_dewey = models.ForeignKey(
Dewey,
models.DO_NOTHING,
db_column='cod_dewey',
blank=True,
null=True)
pais = models.ForeignKey(
'Paises',
models.DO_NOTHING,
db_column='pais',
blank=True,
null=True)
temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')
class Meta:
db_table = 'obras'
ordering = ['-fecha'] # Several criteria can be added at the same time
unique_toghether = ['serie','edicion','titulo']
indexes = [
models.Index(fields=['serie', 'edicion']),
models.Index(fields=['paginas'],condition=Q(pages__lte=200)),
]
Making Queries
F - Expressions
Q - Objects
Annotations
Aggregations
query1 = Obras.objects.annotate(
cantidad_temas=Count('temas')
).filter(
pais='URY',
cantidad_temas__lte=3
).only(
'titulo',
'edicion',
'cantidad_temas')
query4 = Obras.objects.annotate(
cant_autores=Count(
'autores',
distinct=True)
).annotate(editoriales=Count(
'publicado_por',
distinct=True)
).filter(
editoriales__gt=F('cant_autores')
).only('titulo',
'isbn')
query5 = Editoriales.objects.annotate(
cant_primeras=Count(
'obras',
filter=Q(
Q(obras__edicion__contains='1a. ed')|
Q(obras__edicion__contains='1a ed') |
Q(obras__edicion__contains='1a.ed') |
Q(obras__edicion__contains='1ª ed') |
Q(obras__edicion__contains='1ªed') |
Q(obras__edicion__contains='1° ed'))
)
).order_by('-cant_primeras').values_list(
'nombre_editorial',
'cod_editorial')[:1]
query2 = Obra.objects.aggregate(
price_diff=Max(
'precio',
output_field=FloatField()
) - Avg('precio')
)Take advantage of lazy evaluation
q1 = Autores.objects.filter(
obras__cod_obra__in=Obras.objects.annotate(
cant_editoriales=Count('publicado_por')
).filter(
cant_editoriales__gt=1
).values_list(
'cod_obra',
flat=True)
).values_list(
'nombre_autor',
flat=True)
q2 = Autores.objects.filter(
obras__cod_obra__in=Obras.objects.annotate(
cant_editoriales=Count('publicado_por')
).filter(
cant_editoriales=1
).values_list(
'cod_obra',
flat=True)
).values_list(
'nombre_autor',
flat=True)
query3 = q1.difference(q2)
for autor in in query3: # DB gets hit here
print(autor.nombre)What should I do with all this?
Always ask why
print(query4.explain())
GroupAggregate (cost=323.43..363.86 rows=482 width=276)
Group Key: obras.cod_obra
Filter: (count(DISTINCT obra_editorial.cod_editorial) > count(DISTINCT obra_autor.cod_autor))
-> Sort (cost=323.43..329.02 rows=2234 width=282)
Sort Key: obras.cod_obra
-> Hash Right Join (cost=140.26..199.16 rows=2234 width=282)
Hash Cond: (obra_autor.cod_obra = obras.cod_obra)
-> Seq Scan on obra_autor (cost=0.00..29.59 rows=1859 width=8)
-> Hash (cost=118.52..118.52 rows=1739 width=278)
-> Hash Right Join (cost=88.56..118.52 rows=1739 width=278)
Hash Cond: (obra_editorial.cod_obra = obras.cod_obra)
-> Seq Scan on obra_editorial (cost=0.00..25.39 rows=1739 width=6)
-> Hash (cost=70.47..70.47 rows=1447 width=276)
-> Seq Scan on obras (cost=0.00..70.47 rows=1447 width=276)A few general tips
- Postgres is usually WAY faster than Python / Ruby / JS.
- Round trips to the DB are expensive, try to avoid those.
- If you query an attribute often, that should be an index.
- You can run RAW sql as a last resort but only as that.
Questions?
Thanks for listening!
Orm Fine Tuning
By Kalil De Lima
Orm Fine Tuning
- 184