Full-Text Search in Django

Marco Alabruzzo

Lead Developer @ We Got Pop

http://marcoala.com - marco.alabruzzo@gmail.com

London Django Meetup Group

11 August 2020

Why is search so important?

πŸ€”

Different kind of search in Django


Standard textual queries
πŸ”

​External services
🌍

Postgre Full-Text Search
πŸ“„

Postgre Trigram
πŸ“

Standard textual queries πŸ”

Pro

The simplest form of search

Zero Setup: comes with the database

Cons

It's binary: something is there or is not

No ranking

Standard textual queries πŸ”

Standard textual queries πŸ”

Good use case

Filter article title in the admin interface

Bad use case

Search for content in the homepage

External services

Eleasticsearch, Apache Solr, Algolia
🌍

Pros

Can manage millions of records

Search related features

(spell checking, faceting, weighted search, special ranking, etc.)

Β 

Cons

Increase the complexity of the architecture

You will have to manage a copy of your data

External services
🌍

This is the high-cost high-value solution

Hard to implement, but offer the best solution

πŸ’° πŸ’Έ

External services
🌍

Usual implementation

Define a schema of the search index

Index all the existing record of your database

Keep the index up to date with the DB

(post_save and post_delete signals)

External services
🌍

Good use case

Big dataset, specific requirements on ranking, heavy request load

Bad use case

Having constraints around development time and/or architecture complexity

Postgre Full Text Search
πŸ“„

Pros

Semantic(ish) search system

Advanced Ranking System

Search in more than one field

Highly configurable

Cons

Language dependent

Postgre Full Text Search
πŸ“„

How does it work?

Β Normalise wordsΒ 

Remove stop words

Apply a frequency score

Β Normalise wordsΒ 

Remove stop words

Apply a frequency score

deadlines => deadlin

The, for, with

Deadline (7) that is less common than Web (3)

Postgre Full Text Search πŸ“„

Postgre Full Text Search πŸ“„

Postgre Full Text Search
πŸ“„

Ranking system

Multiple matches in the same record

Proximity of matches in the same record

Importance of the field

Postgre Full Text Search
πŸ“„

Good use case

Search in the articles' content

Bad use case

Search for a list of movie titles

Postgre Trigram

πŸ“

Pros

Language-Independent

Will find result even in case of misspelling

Β 

Cons

On large text size can be less precise than Full-Text Search

Postgre Trigram

πŸ“

How does it work?

Spaces are prepended and appended to the string

The string is dived in group of 3 characters (trigrams)

The list of trigram is filtered from duplication and ordered

Postgre Trigram

πŸ“

How does it work?

Spaces are prepended and appended to the string

The string is dived in group of 3 characters (trigrams)

Postgre Trigram

πŸ“

The list of trigram is filtered from duplication and ordered

Postgre Trigram

πŸ“

Misspells

Postgre Trigram

πŸ“

Good use case

Search in a list of unique words

(names, movie titles, song titles, band names)

Bad use case

Search in the articles' content

Different kind of search in Django


Standard textual queries
πŸ”

​External services
🌍

Postgre Full-Text Search
πŸ“„

Postgre Trigram
πŸ“

πŸ™ Thank you πŸ™

Questions? πŸ€”

Made with Slides.com