Full-Text Search in Django

Marco Alabruzzo

Lead Developer @ We Got Pop

http://marcoala.com - marco.alabruzzo@gmail.com

Django Day Copenhagen

25 September 2020

👋

Hello I'm Marco

🐣 Italian born 🇮🇹

🏠 London based 🇬🇧

📐 Lead developer 👨‍💻

We Got POP

❤️ Co-organiser @ Django London ❤️

https://www.meetup.com/djangolondon/

I play

Dungeons & Dragons

Full Text Search + DnD = 🤯

Why is search so important?

🤔

Different kind of search in Django

🛡
The Fighter
Standard textual queries
🧙‍♀️
The Wizard 
​External services
🎸
The Bard
Postgre Full-Text Search
⚔️
The Barbarian
Postgre Trigram

Standard textual queries

The simplest form of search

Zero Setup: comes with the database

Good for MVP

It's binary: something is there or is not

No ranking

The fighter

Intuitive, easy to get started, effective at low level

Potential at high levels is low

Pros

Cons

🛡

🛡

Standard textual queries

Good use case

Filters in the admin interface

Bad use case

Search for content in the homepage

🛡

Standard textual queries

External services

Can manage millions of records

Search related features

(spell checking, faceting, weighted search, special ranking, etc.)

Initial development investment

Increase the complexity of the architecture

You will have to manage a copy of your data

The Wizard

Very hard to get started, you need to know all the mechanics

Once you master it your possibilities are limitless

Pros

Cons

🧙‍♀️

Eleasticsearch, Apache Solr, Algolia

🧙‍♀️

External services

💰 💸

This is the high-cost/high-value solution

Hard to implement, but offer the best solution

Usual implementation

Define a schema of the search index

Index all the existing record of your database

Keep the index up to date with the DB

(post_save and post_delete signals)

🧙‍♀️

External services

Good use case

Big dataset, specific requirements on ranking, heavy request load

Bad use case

Having constraints around development time and/or architecture complexity

🧙‍♀️

External services

Postgre Full Text Search

Language dependent

Semantic(ish) search system

Advanced Ranking System

Search in more than one field

Highly configurable

The Bard

Diplomacy and Deception

Cons

Pros

🎸

How does it work?

 Normalise words 

Remove stop words

Apply a frequency score

🎸 Postgre Full Text Search

 Normalise words 

Remove stop words

Apply a frequency score

deadlines => deadlin

The, for, with

Deadline (7) that is less common than Web (3)

🎸 Postgre Full Text Search

🎸 Postgre Full Text Search

Ranking system

Multiple matches in the same record

Proximity of matches in the same record

Importance of the field

🎸 Postgre Full Text Search

Good use case

Search in the articles' content

Bad use case

Search for a list of movie titles

🎸 Postgre Full Text Search

Postgre Trigram

On large text size can be less precise than Full-Text Search

Language-Independent

Will find result even in case of misspelling

The Barbarian

Big, Fast, Strong, Angry

They can't read or write

(but they don't need to...)

Cons

Pros

⚔️

⚔️

Postgre Trigram

How does it work?

Spaces are prepended and appended to the string

The string is dived in group of 3 characters (trigrams)

The list of trigram is filtered from duplication and ordered

How does it work?

Spaces are prepended and appended to the string

The string is dived in group of 3 characters (trigrams)

⚔️

Postgre Trigram

The list of trigram is filtered from duplication and ordered

⚔️

Postgre Trigram

Misspells

⚔️

Postgre Trigram

Good use case

Search in a list of unique words

(names, movie titles, song titles, band names)

Bad use case

Search in the articles' content

⚔️

Postgre Trigram

Use a GIN Index

Apply a GIN index to your database before performing searches to optimise this two types of queries.

🎸⚔️

Performances on

Full Text and Trigram

Enjoy having different options

🛡
The Fighter
Standard textual queries
🧙‍♀️
The Wizard 
​External services
🎸
The Bard
Postgre Full-Text Search
⚔️
The Barbarian
Postgre Trigram

🙏 Thank you 🙏

Questions? 🤔

Made with Slides.com