ORM

 Caveats, Tips, Optimialization

Getting the SQL behind a Queryset

queryset = Blog.objects.all()
print(queryset.query)

SELECT "blog_entry"."id", "blog_entry"."title", 
"blog_entry"."body", "blog_entry"."slug", "blog_entry"."publish",
 "blog_entry"."created", "blog_entry"."modified", 
"blog_entry"."featured_image" FROM "blog_entry" ORDER BY "blog_entry"."created" DESC

Getting the SQL behind a Queryset

queryset = Blog.objects.all()
print(queryset.query)

SELECT "blog_entry"."id", "blog_entry"."title", 
"blog_entry"."body", "blog_entry"."slug", "blog_entry"."publish",
 "blog_entry"."created", "blog_entry"."modified", 
"blog_entry"."featured_image" FROM "blog_entry" ORDER BY "blog_entry"."created" DESC

Getting the SQL behind a Queryset

from django.db import connection
connections.queries    # only with DEBUG=TRUE

[{'sql': 'SELECT polls_polls.id, polls_polls.question,
 polls_polls.pub_date FROM polls_polls',
'time': '0.002'}]

Managers

class AuthorManager(models.Manager):
    def get_queryset(self):
        return super(AuthorManager, self).get_queryset().filter(role='A')

class EditorManager(models.Manager):
    def get_queryset(self):
        return super(EditorManager, self).get_queryset().filter(role='E')

class Person(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    role = models.CharField(max_length=1, choices=(('A', _('Author')), ('E', _('Editor'))))
    people = models.Manager()
    authors = AuthorManager()
    editors = EditorManager()

Person.authors.all() # vs Person.filter(role='A')
Person.editors.all()
Person.people.all()

ok

Managers

class AuthorManager(models.Manager):
    def get_queryset(self):
        return super(AuthorManager, self).get_queryset().filter(role='A')

class EditorManager(models.Manager):
    def get_queryset(self):
        return super(EditorManager, self).get_queryset().filter(role='E')

class Person(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    role = models.CharField(max_length=1, choices=(('A', _('Author')), ('E', _('Editor'))))
    people = models.Manager()
    authors = AuthorManager()
    editors = EditorManager()

Person.authors.all() # vs Person.filter(role='A')
Person.editors.all()
Person.people.all()

Person.people.filter(last_name__startswith='A').authors() # Exception

problem (cannot chain)

Managers

class PersonQuerySet(models.QuerySet):
    def authors(self):
        return self.filter(role='A')

    def editors(self):
        return self.filter(role='E')

class PersonManager(models.Manager):
    def get_queryset(self):
        return PersonQuerySet(self.model, using=self._db)

    def authors(self):
        return self.get_queryset().authors()

    def editors(self):
        return self.get_queryset().editors()

class Person(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    role = models.CharField(max_length=1, choices=(('A', _('Author')), ('E', _('Editor'))))
    people = PersonManager()

Person.people.authos()
Person.people.editors()
Person.people.filter(last_name__startswith='A').authors() # OK

better (but verbose)

Managers

class PersonQuerySet(models.QuerySet):
    def authors(self):
        return self.filter(role='A')

    def editors(self):
        return self.filter(role='E')

class Person(models.Model):
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    role = models.CharField(max_length=1, choices=(('A', _('Author')), ('E', _('Editor'))))
    people = PersonQuerySet.as_manager()

Person.people.authos()
Person.people.editors()
Person.people.filter(last_name__startswith='A').authors() # OK

best

Managers

  • Adding extra Manager methods is the preferred way to add “table-level” functionality to your models. (For “row-level” functionality – i.e., functions that act on a single instance of a model object – use Model methods, not custom Manager methods.)
  • A custom Manager method can return anything you want. It doesn’t have to return a QuerySet (perform a complex aggregation, hide a raw sql)

Overriding Base Managers

Don’t filter away any results in this type of manager subclass.

 

If you override the get_queryset() method and filter out any rows, Django will return incorrect results. Don’t do that. A manager that filters results in get_queryset() is not appropriate for use as a default manager.

Overriding Base Managers

Scenario

For a Post model an inactive flag is requested.

The development is an advanced stage.

The inactive posts won't be involved in 99% of cases.

Overriding Base Managers

Scenario

For a Post model an inactive flag is requested.

The development is an advanced stage.

The inactive posts won't participate in 99% of cases.

posts_to_publish = Post.objects.filter(...)
for post in posts_to_publish:
    post.publish()

posts_to_publish = Post.objects.active().filter(...)
for post in posts_to_publish:
    post.publish()

Overriding Base Managers

Scenario

For a Post model an inactive flag is requested.

The development is an advanced stage.

The inactive posts won't participate in 99% of cases.

posts_to_publish = Post.objects.filter(...)
for post in posts_to_publish:
    post.publish()

posts_to_publish = Post.objects.active().filter(...)
for post in posts_to_publish:
    post.publish()

What if we need to do that 100 times ?

Overriding Base Managers

 

Solution : Replace the custom Manager...

 

class ActiveBlogManager(models.Manager):
    def get_queryset(self):
        return super(ActiveBlogManager, self).get_queryset().\
        filter(status='ACTIVE')

class Blog(models.Model):
    ...
    objects = ActiveBlogManager()

Overriding Base Managers

Problems (and solutions):

No way to access inactive Posts!

 

Overriding Base Managers

Problems (and solutions):

No way to access inactive Posts!

Create a second manager which returns inactive Post

 

Overriding Base Managers

Problems (and solutions):

No way to access inactive Posts!

Create a second manager which returns inactive Post

Django Admin uses the default queryset (won't list inactive Posts)

 

Overriding Base Managers

Problems (and solutions):

No way to access inactive Posts!

Create a second manager which returns inactive Post

Django Admin uses the default queryset (won't list inactive Posts)

Override the get_queryset Admin model to include all Posts

 

Overriding Base Managers

Problems (and solutions):

No way to access inactive Posts!

Create a second manager which returns inactive Post

Django Admin uses the default queryset (won't list inactive Posts)

Override the get_queryset Admin model to include all Posts

The reverse manager uses the default manager ( blog.post_set.all() )

 

Overriding Base Managers

Problems (and solutions):

No way to access inactive Posts!

Create a second manager which returns inactive Post

Django Admin uses the default queryset (won't list inactive Posts)

Override the get_queryset Admin model to include all Posts

The reverse manager uses the default manager ( blog.post_set.all() )

blog.post_set(manager='inactive').all()

Tips

Use fk keys directly

 

 

 

entry.blog_id
# vs
entry.blog.id

Tips

The with templatetag

 

 

{% with total=business.employees.count %}
    {% if total < 10 %}
        <p>Only {{total}} employees left! Hire more!</p>
    {% endif %}
{% endwith %}

Tips

Don't retrieve things you don't need

 

Blog.objects.all()

#vs

Blog.objects.values('id', 'name')
<QuerySet [{'id': 1, 'name': 'My Blog'}]>
Person.objects.only("name")

Don't retrieve things you don't need (assume a model with hundreds of columns)

 

Tips

Take advantage of query expressions and database functions

len(Entry.objects.all())
#vs
Entry.objects.count()

name = name.lower()
Blog.objects.get(name=name)
#vs
Blog.objects.get(name__iexact='beatles blog')


for author in Author.objects.iterator():

    if author.alias:
        name = author.alias:
    if author.goes_by:
        name = name
    name = author.name

    setattr(author,'screen_name', name)
#vs
author = Author.objects.annotate(
    screen_name=Coalesce('alias', 'goes_by', 'name'))

Tips

Take advantage of query expressions and database functions

# getting all unique values for a model

# don't do it in Python...


class Transaction(models.Model):
    currency = models.CharField(u'Waluta', max_length=3, validators=[CurrencyValidator])

Tips

Take advantage of query expressions and database functions

# getting all unique values for a model

# don't do it in Python...

Transaction.objects.values('currency')

[{'currency': u'PLN'}, {'currency': u'PLN'}, {'currency': u'USD'}, {'currency': u'USD'},
 {'currency': u'HUF'}]

Transaction.objects.filter().order_by().values('currency').distinct()

[{'currency': u'PLN'}, {'currency': u'USD'}, {'currency': u'HUF'}]

Transaction.objects.filter().order_by().values_list('currency').distinct()

[(u'HUF',), (u'USD',), (u'GBP',), (u'PLN',), (u'CHF',), (u'EUR',)]

Transaction.objects.filter().order_by().values_list('currency', flat=True).distinct()

[u'HUF', u'USD', u'GBP', u'PLN', u'CHF', u'EUR']

Tips

Take advantage of query expressions and database functions

#referencing fields in the same model

#lectures_with_not_enough_seats
Lecture.objects.filter(room__seats_lt=F('attendees')


#avoiding race conditions

post = Post.objects.get(id=5434)
post.likes += 1
post.save()


from django.db.models import F

post = Post.objects.get(id=5434)
post.likes = F('likes') + 1
post.save()

Various caveats

Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)

# vs

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

Spanning multi-valued relationships

Various caveats

Blog.objects.filter(entry__headline__contains='Lennon', entry__pub_date__year=2008)

# vs

Blog.objects.filter(entry__headline__contains='Lennon').filter(entry__pub_date__year=2008)

Spanning multi-valued relationships

Suppose there is only one blog that had both entries containing “Lennon” and entries from 2008, but that none of the entries from 2008 contained “Lennon”. The first query would not return any blogs, but the second query would return that one blog.

Various caveats

queryset = Entry.objects.all()
print queryset[5]
print queryset[5]

   Lazy QuerySet evaluation

Various caveats

queryset = Entry.objects.all()
print queryset[5] # Queries the database
print queryset[5] # Queries the database

   Lazy QuerySet evaluation

Various caveats

queryset = Entry.objects.all()
[entry for entry in queryset]
print queryset[5]
print queryset[5]

   Lazy QuerySet evaluation

Various caveats

queryset = Entry.objects.all()
[entry for entry in queryset] # Queries the database
print queryset[5] # Uses cache
print queryset[5] # Uses cache

   Lazy QuerySet evaluation

Various caveats

# Average price across all books.
>>> from django.db.models import Avg
>>> Book.objects.all().aggregate(Avg('price'))
{'price__avg': 34.35}

'sql': u'SELECT AVG("app_book"."price") AS "price__avg" FROM "app_book"'


# Each publisher, each with a count of books as a "num_books" attribute.
>>> from django.db.models import Count
>>> pubs = Publisher.objects.annotate(num_books=Count('book'))
>>> pubs
<QuerySet [<Publisher: BaloneyPress>, <Publisher: SalamiPress>, ...]>
>>> pubs[0].num_books

'SELECT *, COUNT("app_publisher"."book") 
AS "num_books" 
FROM "app_publisher" 
GROUP BY "app_publisher"."id"

Annotate(returns a QuerySet)

Aggregate(returns a dict)

Various caveats

# Average price across all books.
>>> from django.db.models import Avg
>>> Book.objects.all().aggregate(Avg('price'))
{'price__avg': 34.35}

'sql': u'SELECT AVG("app_book"."price") AS "price__avg" FROM "app_book"'


# Each publisher, each with a count of books as a "num_books" attribute.
>>> from django.db.models import Count
>>> pubs = Publisher.objects.annotate(num_books=Count('book'))
>>> pubs
<QuerySet [<Publisher: BaloneyPress>, <Publisher: SalamiPress>, ...]>
>>> pubs[0].num_books

'SELECT *, COUNT("app_publisher"."book") 
AS "num_books" 
FROM "app_publisher" 
GROUP BY "app_publisher"."id"

Annotate(returns a QuerySet)

Aggregate(returns a dict)

Various caveats

>>> a, b = Publisher.objects.annotate(avg_rating=Avg('book__rating')).\
filter(book__rating__gt=3.0)

>>> a, b = Publisher.objects.filter(book__rating__gt=3.0).\
annotate(avg_rating=Avg('book__rating'))


The first query asks for the average rating of all a publisher’s books for publisher’s that
 have at least one book with a rating exceeding 3.0. 

The second query asks for the average of a publisher’s book’s ratings for only
 those ratings exceeding 3.0.

The order of annotate() and filter() clauses matters

Various caveats

class Transaction(models.Model):
    ...

    class Meta:
        ordering = ["id"]


Transaction.objects.order_by().filter().values('currency').query
'SELECT "vault_transaction"."currency" FROM "vault_transaction'

Default Ordering

  • don't  use where you dont' need it

  • clear the default ordering

Other Topic

  • prefetch_related, select_related, Prefetch
  • Postgresql Full text search in Django
  • PostgreSQL specific model fields (ArrayField, JSONField)
  • Caching in Django

26.10.2016

orm

By zqzak

orm

  • 378