Marco Alabruzzo
Lead Developer @ We Got Pop
http://marcoala.com - marco.alabruzzo@gmail.com
Django Day Copenhagen
25 September 2020
🐣 Italian born 🇮🇹
🏠 London based 🇬🇧
📐 Lead developer 👨💻
We Got POP
❤️ Co-organiser @ Django London ❤️
https://www.meetup.com/djangolondon/
🛡 The Fighter Standard textual queries |
🧙♀️ The Wizard External services |
🎸 The Bard Postgre Full-Text Search |
⚔️ The Barbarian Postgre Trigram |
The simplest form of search
Zero Setup: comes with the database
Good for MVP
It's binary: something is there or is not
No ranking
Intuitive, easy to get started, effective at low level
Potential at high levels is low
Pros
Cons
Good use case
Filters in the admin interface
Bad use case
Search for content in the homepage
Can manage millions of records
Search related features
(spell checking, faceting, weighted search, special ranking, etc.)
Initial development investment
Increase the complexity of the architecture
You will have to manage a copy of your data
Very hard to get started, you need to know all the mechanics
Once you master it your possibilities are limitless
Pros
Cons
Eleasticsearch, Apache Solr, Algolia
💰 💸
This is the high-cost/high-value solution
Hard to implement, but offer the best solution
Usual implementation
Define a schema of the search index
Index all the existing record of your database
Keep the index up to date with the DB
(post_save and post_delete signals)
Good use case
Big dataset, specific requirements on ranking, heavy request load
Bad use case
Having constraints around development time and/or architecture complexity
Language dependent
Semantic(ish) search system
Advanced Ranking System
Search in more than one field
Highly configurable
Diplomacy and Deception
Cons
Pros
How does it work?
Normalise words
Remove stop words
Apply a frequency score
Normalise words
Remove stop words
Apply a frequency score
deadlines => deadlin
The, for, with
Deadline (7) that is less common than Web (3)
Ranking system
Multiple matches in the same record
Proximity of matches in the same record
Importance of the field
Good use case
Search in the articles' content
Bad use case
Search for a list of movie titles
On large text size can be less precise than Full-Text Search
Language-Independent
Will find result even in case of misspelling
Big, Fast, Strong, Angry
They can't read or write
(but they don't need to...)
Cons
Pros
How does it work?
Spaces are prepended and appended to the string
The string is dived in group of 3 characters (trigrams)
The list of trigram is filtered from duplication and ordered
How does it work?
Spaces are prepended and appended to the string
The string is dived in group of 3 characters (trigrams)
The list of trigram is filtered from duplication and ordered
Misspells
Good use case
Search in a list of unique words
(names, movie titles, song titles, band names)
Bad use case
Search in the articles' content
Use a GIN Index
Apply a GIN index to your database before performing searches to optimise this two types of queries.
🛡 The Fighter Standard textual queries |
🧙♀️ The Wizard External services |
🎸 The Bard Postgre Full-Text Search |
⚔️ The Barbarian Postgre Trigram |