ORM Fine Tuning

Kalil de Lima

Rootstrap Week 2019

 

kalil@rootstrap.com

github.com/kaozdl

 

About Me

About Me

Backend developer

About Me

Backend developer

#hay-cafe

About Me

Backend developer

#hay-cafe

#open-source

About Me

Backend developer

#hay-cafe

#open-source

Hobbies

#python

About Me

Backend developer

#hay-cafe

#open-source

Hobbies

Video Games

#python

About Me

Backend developer

#hay-cafe

#open-source

Hobbies

Video Games

Martial arts

#python

About Me

Backend developer

#hay-cafe

#open-source

Hobbies

Video Games

Martial arts

Programming

#python

About Me

Backend developer

#hay-cafe

#open-source

Hobbies

Video Games

Martial arts

Programming
Especially clever hacks

#python

What is an ORM?

SQL

Python

SELECT a.nombre_autor
FROM obra_autor oa
NATURAL JOIN obras
INNER JOIN autores a on oa.cod_autor = a.cod_autor
NATURAL JOIN obra_editorial oe
WHERE 1 < ALL
    (SELECT COUNT(DISTINCT oe.cod_editorial)
     FROM obra_autor oa
     NATURAL JOIN obra_editorial oe
     WHERE oa.cod_autor = a.cod_autor
     GROUP BY oa.cod_obra)
GROUP BY a.cod_autor,
         a.nombre_autor
ORDER BY a.nombre_autor;

q1 = Autores.objects.filter(
        obras__cod_obra__in=Obras.objects.annotate(
            cant_editoriales=Count('publicado_por')
            ).filter(
                cant_editoriales__gt=1
                ).values_list(
                    'cod_obra',
                    flat=True)
                ).values_list(
                        'nombre_autor',
                        flat=True)

q2 = Autores.objects.filter(
        obras__cod_obra__in=Obras.objects.annotate(
            cant_editoriales=Count('publicado_por')
            ).filter(
                cant_editoriales=1
                ).values_list(
                    'cod_obra',
                    flat=True)
                ).values_list(
                        'nombre_autor',
                        flat=True)

query3 = q1.difference(q2)

SQL

                         Table "public.obras"
   Column    |          Type          | Collation | Nullable | Default
-------------+------------------------+-----------+----------+---------
 cod_obra    | integer                |           | not null |
 titulo      | character varying(500) |           |          |
 ubicacion   | character varying(30)  |           |          |
 edicion     | character varying(100) |           |          |
 pais        | character(3)           |           |          |
 fecha       | character(10)          |           |          |
 paginacion  | character(20)          |           |          |
 serie       | character varying(100) |           |          |
 notas       | text                   |           |          |
 isbn        | character(20)          |           |          |
 cod_dewey   | character(25)          |           |          |
 supervisada | smallint               |           |          |
Indexes:
   "obras_pkey" PRIMARY KEY, btree (cod_obra)
Foreign-key constraints:
   "obras_cod_dewey_fkey" FOREIGN KEY (cod_dewey) REFERENCES dewey(cod_dewey)
   "obras_pais_fkey" FOREIGN KEY (pais) REFERENCES paises(cod_pais)
Referenced by:
   TABLE "inventario" CONSTRAINT "inventario_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)
   TABLE "obra_autor" CONSTRAINT "obra_autor_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)
   TABLE "obra_editorial" CONSTRAINT "obra_editorial_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)
   TABLE "obra_tema" CONSTRAINT "obra_tema_cod_obra_fkey" FOREIGN KEY (cod_obra) REFERENCES obras(cod_obra)

Python

class Obras(models.Model):
    cod_obra = models.IntegerField(primary_key=True)
    titulo = models.CharField(max_length=500, blank=True, null=True)
    ubicacion = models.CharField(max_length=30, blank=True, null=True)
    edicion = models.CharField(max_length=100, blank=True, null=True)
    pais = models.ForeignKey('Paises', models.DO_NOTHING, db_column='pais', blank=True, null=True)
    fecha = models.CharField(max_length=10, blank=True, null=True)
    paginacion = models.CharField(max_length=20, blank=True, null=True)
    serie = models.CharField(max_length=100, blank=True, null=True)
    notas = models.TextField(blank=True, null=True)
    isbn = models.CharField(max_length=20, blank=True, null=True)
    cod_dewey = models.ForeignKey(Dewey, models.DO_NOTHING, db_column='cod_dewey', blank=True, null=True)
    supervisada = models.SmallIntegerField(blank=True, null=True)

    temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
    autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
    publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')

    class Meta:
        managed = False
        db_table = 'obras'

Main Features

  • Interact with the database in one consistent language
  • Useful built-in features
  • Uniform DB access in all your application

Some Examples

  • Django ORM - Python
  • SQLAlchemy - Python
  • Rails ORM - Ruby
  • ROM - Ruby
  • TypeORM - TypeScript

Active Record

Classes are a wrapper for a database table mapping properties in the class with columns in the table.

Each object instance is a row

Data Mapper

Classes provide create, update, delete, retrieve interfaces.
These methods provide the interface with the database.

Two Patterns

Active Record

Main design pattern used in Rails and Django.
Suitable for quick prototyping due to the standard interfaces for relatively simple objects.

Even tough is very easy to use it has some potential pitfalls that can slow down our application

The N+1 Problem

# list all the users with their purchases
info = ''
for user in User.objects.all(): # One query for retrieving all the users
	info += f'Username: {user,username} \n'
    for purchase in user.purchases.all():# For every user one query retrieving all purchases
    	info += f'|> item: {purchase.item} date: {purchase.date}}\n'
return info

# Total queries: n + 1
# 1 query for fetching n users,
# n querys for fetching purchases

info = ''
for user in User.objects.prefetch_related('purchases'): # Retrieve all users 
	info += f'Username: {user,username} \n'          # with them purchases
    	for purchase in user.purchases.all():           # Purchases are now preloaded
    		info += f'|> item: {purchase.item} date: {purchase.date}}\n'
return info

# Total queries: 1

Index Only Scans

In [1]: from biblioteca.models import Obras

In [2]: overfetch = Obras.objects.all()

In [3]: fetch = Obras.objects.only('pais','edicion','fecha')
SELECT 
  "obras"."cod_obra",
  "obras"."titulo",
  "obras"."ubicacion",
  "obras"."edicion",
  "obras"."pais",
  "obras"."fecha",
  "obras"."paginacion",
  "obras"."serie",
  "obras"."notas",
  "obras"."isbn",
  "obras"."cod_dewey",
  "obras"."supervisada"
FROM "obras";

SELECT 
  "obras"."cod_obra",
  "obras"."edicion",
  "obras"."pais",
  "obras"."fecha"
FROM "obras";

Django ORM

Fine Tuning

Indexes

class Obras(models.Model):
    cod_obra = models.IntegerField(primary_key=True) #Automatic Index
    titulo = models.CharField(max_length=500)
    ubicacion = models.CharField(max_length=30)
    edicion = models.CharField(max_length=100, db_index=True) # Manually added
    fecha = models.DateField(max_length=10)
    paginas = models.SmallIntegerField(max_length=20)
    serie = models.CharField(max_length=100) # Manually added
    notas = models.TextField(blank=True, null=True)
    isbn = models.CharField(max_length=20)
    supervisada = models.SmallIntegerField()

    cod_dewey = models.ForeignKey(
    	Dewey,
        models.DO_NOTHING,
        db_column='cod_dewey',
        blank=True,
        null=True)

	pais = models.ForeignKey(
    	'Paises',
        models.DO_NOTHING,
        db_column='pais',
        blank=True,
        null=True)
    
    temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
    autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
    publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')

    class Meta:
        managed = False
        db_table = 'obras'
        indexes = [
          models.Index(fields=['cod_obra', 'pais', 'edicion', 'fecha']),
          models.Index(fields=['paginas'],condition=Q(pages__lte=200)),
        ]

Multi Attribute Keys

class Obras(models.Model):
    cod_obra = models.IntegerField(primary_key=True) #Automatic Index
    titulo = models.CharField(max_length=500)
    ubicacion = models.CharField(max_length=30)
    edicion = models.CharField(max_length=100, db_index=True) # Manually added
    fecha = models.DateField(max_length=10)
    paginas = models.SmallIntegerField(max_length=20)
    serie = models.CharField(max_length=100) # Manually added
    notas = models.TextField(blank=True, null=True)
    isbn = models.CharField(max_length=20)
    supervisada = models.SmallIntegerField()

    cod_dewey = models.ForeignKey(
    	Dewey,
        models.DO_NOTHING,
        db_column='cod_dewey',
        blank=True,
        null=True)

    pais = models.ForeignKey(
      'Paises',
      models.DO_NOTHING,
      db_column='pais',
      blank=True,
      null=True)
    
    temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
    autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
    publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')

    class Meta:
        db_table = 'obras'
        unique_toghether = (('serie','edicion','titulo'),) # A tuple with tuples of unique together attributes
        indexes = [
          models.Index(fields=['serie', 'edicion']),
          models.Index(fields=['paginas'],condition=Q(pages__lte=200)),
        ]

Ordering

class Obras(models.Model):
    cod_obra = models.IntegerField(primary_key=True) #Automatic Index
    titulo = models.CharField(max_length=500)
    ubicacion = models.CharField(max_length=30)
    edicion = models.CharField(max_length=100, db_index=True) # Manually added
    fecha = models.DateField(max_length=10)
    paginas = models.SmallIntegerField(max_length=20)
    serie = models.CharField(max_length=100) # Manually added
    notas = models.TextField(blank=True, null=True)
    isbn = models.CharField(max_length=20)
    supervisada = models.SmallIntegerField()

    cod_dewey = models.ForeignKey(
    	Dewey,
        models.DO_NOTHING,
        db_column='cod_dewey',
        blank=True,
        null=True)

    pais = models.ForeignKey(
      'Paises',
      models.DO_NOTHING,
      db_column='pais',
      blank=True,
      null=True)
    
    temas = models.ManyToManyField('Temas', through='ObraTema', related_name='obras')
    autores = models.ManyToManyField('Autores', through='ObraAutor', related_name='obras')
    publicado_por = models.ManyToManyField('Editoriales', through='ObraEditorial', related_name='obras')

    class Meta:
        db_table = 'obras'
        ordering = ['-fecha'] # Several criteria can be added at the same time
        unique_toghether = ['serie','edicion','titulo']
        indexes = [
          models.Index(fields=['serie', 'edicion']),
          models.Index(fields=['paginas'],condition=Q(pages__lte=200)),
        ]

Making Queries

F - Expressions

Q - Objects

Annotations

Aggregations

query1 = Obras.objects.annotate(
        cantidad_temas=Count('temas')
        ).filter(
            pais='URY',
            cantidad_temas__lte=3
            ).only(
                   'titulo',
                   'edicion',
                   'cantidad_temas')
query4 = Obras.objects.annotate(
        cant_autores=Count(
          'autores',
          distinct=True)
        ).annotate(editoriales=Count(
  'publicado_por',
  distinct=True)
            ).filter(
                    editoriales__gt=F('cant_autores')
                    ).only('titulo',
                            'isbn')
query5 = Editoriales.objects.annotate(
            cant_primeras=Count(
                'obras',
                filter=Q(
                    Q(obras__edicion__contains='1a. ed')|
                    Q(obras__edicion__contains='1a ed') |
                    Q(obras__edicion__contains='1a.ed') |
                    Q(obras__edicion__contains='1ª ed') |
                    Q(obras__edicion__contains='1ªed')  |
                    Q(obras__edicion__contains='1° ed'))
                )
    ).order_by('-cant_primeras').values_list(
            'nombre_editorial',
            'cod_editorial')[:1]
query2 = Obra.objects.aggregate(
  price_diff=Max(
    'precio',
    output_field=FloatField()
  ) - Avg('precio')
)

Take advantage of lazy evaluation

q1 = Autores.objects.filter(
        obras__cod_obra__in=Obras.objects.annotate(
            cant_editoriales=Count('publicado_por')
            ).filter(
                cant_editoriales__gt=1
                ).values_list(
                    'cod_obra',
                    flat=True)
                ).values_list(
                        'nombre_autor',
                        flat=True)

q2 = Autores.objects.filter(
        obras__cod_obra__in=Obras.objects.annotate(
            cant_editoriales=Count('publicado_por')
            ).filter(
                cant_editoriales=1
                ).values_list(
                    'cod_obra',
                    flat=True)
                ).values_list(
                        'nombre_autor',
                        flat=True)

query3 = q1.difference(q2)

for autor in in query3: # DB gets hit here
	print(autor.nombre)

What should I do with all this?

Always ask why

print(query4.explain())
GroupAggregate  (cost=323.43..363.86 rows=482 width=276)
  Group Key: obras.cod_obra
  Filter: (count(DISTINCT obra_editorial.cod_editorial) > count(DISTINCT obra_autor.cod_autor))
  ->  Sort  (cost=323.43..329.02 rows=2234 width=282)
        Sort Key: obras.cod_obra
        ->  Hash Right Join  (cost=140.26..199.16 rows=2234 width=282)
              Hash Cond: (obra_autor.cod_obra = obras.cod_obra)
              ->  Seq Scan on obra_autor  (cost=0.00..29.59 rows=1859 width=8)
              ->  Hash  (cost=118.52..118.52 rows=1739 width=278)
                    ->  Hash Right Join  (cost=88.56..118.52 rows=1739 width=278)
                          Hash Cond: (obra_editorial.cod_obra = obras.cod_obra)
                          ->  Seq Scan on obra_editorial  (cost=0.00..25.39 rows=1739 width=6)
                          ->  Hash  (cost=70.47..70.47 rows=1447 width=276)
                                ->  Seq Scan on obras  (cost=0.00..70.47 rows=1447 width=276)

A few general tips

  • Postgres is usually WAY faster than Python / Ruby / JS.
  • Round trips to the DB are expensive, try to avoid those.
  • If you query an attribute often, that should be an index.
  • You can run RAW sql as a last resort but only as that.

Questions?

Thanks for listening!

Made with Slides.com