Rules of thumb for queries optimization

About me

  • Name: Ivaylo Donchev

  • Age: 24

  • Experience with Django: 4 years

  • Full stack developer at HackSoft

Tooling

FE

Django

Database

sql queries

HTTP request

HTTP response

The main job of the list / detail view is to forward data from DB to the client with as minimum job from the application as possible.

A thought.. 

class Demo0Api(APIView):
    class ArticleSerializer(serializers.Serializer):
        id = serializers.IntegerField()
        title = serializers.CharField(max_length=255)

    def get(self, request):
        """
        Get first 200 articles
        """
        queryset = list(Article.objects.all()[:200])

        serializer = self.ArticleSerializer(queryset, many=True)

        return Response(data=serializer.data)

DRF API

Debug toolbar

  1. Too many queries caused by:
    • logic around related objects
    • redundant queries executions
  2. Too slow queries caused by:
    • too huge DB tables
    • too complex relations
  3. Too slow application logic

Common performance problems with Fetch APIs:

The n + 1 SELECT

queries problem

class Foo(Model):
    name = CharField(max_length=255)      ###########################
                                          #                         #
                                          #     Foo   Foo   Foo     #
class Bar(Model):                         #       \    |    /       #
    name = CharField(max_length=255)      #        \   |   /        #
                                          #         \  |  /         #
    foo = ForeignKey(                     #          \ | /          #
      Foo,                                #           Bar           #
      related_name='bars',                #                         #
      on_delete=CASCADE                   ###########################
    )                                     

# Case 1
for bar in Bar.objects.all():
    print(bar.foo.name)
 
    
# Case 2
for foo in Foo.objects.all():
    for bar in foo.bars.all():
        print(bar.name)

The n + 1 SELECT

queries solution

class Foo(Model):
    name = CharField(max_length=255)      ###########################
                                          #                         #
                                          #     Foo   Foo   Foo     #
class Bar(Model):                         #       \    |    /       #
    name = CharField(max_length=255)      #        \   |   /        #
                                          #         \  |  /         #
    foo = ForeignKey(                     #          \ | /          #
      Foo,                                #           Bar           #
      related_name='bars',                #                         #
      on_delete=CASCADE                   ###########################
    )                                     

# Case 1
for bar in Bar.objects.select_related('foo'):
    print(bar.foo.name)
 
    
# Case 2
for foo in Foo.objects.prefetch_related('bars'):
    for bar in foo.bars.all():
        print(bar.name)

DEMO

Relations ? Use select_related and prefetch_related 

Rule of thumb #1

class Invoice(Model):
    date = DateField()
    description = TextField()

    @property
    def total_price(self):
        price = 0

        for item in self.items.all():
            price += item.price

        return price


class InvoiceItem(Model):
    invoice = ForeignKey(
        Invoice,
        on_delete=CASCADE,
        related_name='items',
    )

    quantity = IntegerField()
    unit_price = DecimalField(max_digits=10, decimal_places=2)
    tax = DecimalField(
        max_digits=10,
        decimal_places=2,
        default=Decimal('0.2')
    )

    @property
    def price(self):
        return self.quantity * self.unit_price * (1 + self.tax)
class Demo2Api(APIView):
    class InvoiceSerializer(Serializer):
        id = IntegerField()
        total_price = DecimalField(max_digits=10, decimal_places=2)

    def get(self, request):
        queryset = Invoice.objects.prefetch_related('items')

        serializer = self.InvoiceSerializer(
            queryset,
            many=True
        )

        return Response(data=serializer.data)

~1.5 seconds to load (~700 Invoices)

class Demo2Api(APIView):
    class InvoiceSerializer(serializers.Serializer):
        id = serializers.IntegerField()
        total_price = serializers.DecimalField(max_digits=10, decimal_places=2)

    def get(self, request):
        price_expression = ExpressionWrapper(
            expression=F('quantity') * F('unit_price') * (F('tax') + Value(1)),
            output_field=DecimalField()
        )

        queryset = Invoice.objects \
            .annotate(
                _total_price=Coalesce(
                    Subquery(
                        queryset=InvoiceItem.objects
                            .annotate(_price=price_expression)
                            .filter(invoice=OuterRef('id'))
                            .values_list('invoice__id')
                            .values_list(Sum('_price'))[:1],
                        output_field=DecimalField()
                    ),
                    0.0
                )
            )

        serializer = self.InvoiceSerializer(
            queryset,
            many=True
        )

        return Response(data=serializer.data)

~0.2 seconds to load (~700 Invoices)

DEMO

Use DB annotations whenever you can

Rule of thumb #2

ORM SELECT  lifecycle:

  1. Construct the ORM data structure for the query
  2. Compile the query (SQL compiler)
  3. Execute the query + Load the low level DB data into python memory
  4. Load the DB data into ORM data structures

.filter() & .exclude()

.annotate()

.only() & .defer()

CRUD operations

  1. Construct
  2. Compile
  3. Execute
  4. Load

Query lifecycle

How many queries does this code make?

QuerySet

sql.Query

Q(age__gt=20)

Q(name_startswith='I')

BaseUser.objects \
    .filter(age__gt=20) \
    .filter(name__startswith='I')

QuerySet is an immutable math tree

What happens

when you chain a new method

  1. The whole math tree is cloned
  2. The change is made in the new tree

User.objects.all() != User.objects.all()

"Django Source" time

.update() == UPDATE ... WHERE ...

.delete() == DELETE .... WHERE ...

.create() == INSERT ... VALUES ...

? == SELECT ... FROM ...

CRUD operations

  1. Construct
  2. Compile
  3. Execute
  4. Load

Query lifecycle

"Django Source" & Demo

QuerySets are also generators with cache!

Rule of thumb #3

What is the difference between:

  • .all()
  • .values()
  • .values_list()

.all() - model instance

.values() - dictionary

.values_list() - tuple

.values_list(..., named=True) - named tuple

Data structures

  1. Construct
  2. Compile
  3. Execute
  4. Load

Query lifecycle

Heavier data == lighter data structure

Rule of thumb #4

Objects count

Data structure

orm object

dictionary

tuple

named tuple*

def attach_is_expired_dynamically(invoices):
    expiration_days = 2  # 2 days
    now = datetime.now().date()

    for invoice in invoices:
        is_expired = bool(invoice.date + timedelta(days=expiration_days) > now)
        invoice.is_expired = is_expired
        
    return invoices
class InvoiceItemQuerySet(QuerySet):
    def map(self, func):
        class MyIterableClass(self._iterable_class):
            def __iter__(self):
                for obj in super(self.__class__, self).__iter__():
                    yield func(obj)

        qs = self._clone()
        qs._iterable_class = MyIterableClass

        return qs


class InvoiceItem(Model):
    objects = InvoiceItemQuerySet.as_manager()

    invoice = ForeignKey(
        Invoice,
        on_delete=CASCADE,
        related_name='items',
    )

    quantity = IntegerField()
    unit_price = DecimalField(max_digits=10, decimal_places=2)
    tax = DecimalField(
        max_digits=10,
        decimal_places=2,
        default=Decimal('0.2')
    )

    @property
    def price(self):
        return self.quantity * self.unit_price * (1 + self.tax)

If you need a custom iterable just create a new one.

Rule of thumb #5

Warning!

High memory consumption

 

(source sample)

High memory consumption on .all() ?

Use .iterator()

Rule of thumb #6

Extra query != slow View

Rule of thumb #7

Check our Django Styleguide :)

https://github.com/HackSoftware/Django-Styleguide

Q & A

  • No1: Relations ? Use select_related and prefetch_related
  • No2: Use DB annotations whenever you can 
  • No3: QuerySets are (also) generators with cache!
  • No4: Heavier data == lighter data structure
  • No5: If you need a custom iterable just create a new one.
  • No6: Use iterator to save some memory
  • No7:  Extra query != slow View

Rules of thumb for queries optimization

By Hack Bulgaria

Rules of thumb for queries optimization

  • 1,064