Marco Alabruzzo
Lead Developer @ We Got Pop
http://marcoala.com - marco.alabruzzo@gmail.com
London Django Meetup Group
11 August 2020
Standard textual queries π |
βExternal services π |
Postgre Full-Text Search π |
Postgre Trigram π |
Pro
The simplest form of search
Zero Setup: comes with the database
Cons
It's binary: something is there or is not
No ranking
Good use case
Filter article title in the admin interface
Bad use case
Search for content in the homepage
Pros
Can manage millions of records
Search related features
(spell checking, faceting, weighted search, special ranking, etc.)
Β
Cons
Increase the complexity of the architecture
You will have to manage a copy of your data
This is the high-cost high-value solution
Hard to implement, but offer the best solution
π° πΈ
Usual implementation
Define a schema of the search index
Index all the existing record of your database
Keep the index up to date with the DB
(post_save and post_delete signals)
Good use case
Big dataset, specific requirements on ranking, heavy request load
Bad use case
Having constraints around development time and/or architecture complexity
Pros
Semantic(ish) search system
Advanced Ranking System
Search in more than one field
Highly configurable
Cons
Language dependent
How does it work?
Β Normalise wordsΒ
Remove stop words
Apply a frequency score
Β Normalise wordsΒ
Remove stop words
Apply a frequency score
deadlines => deadlin
The, for, with
Deadline (7) that is less common than Web (3)
Ranking system
Multiple matches in the same record
Proximity of matches in the same record
Importance of the field
Good use case
Search in the articles' content
Bad use case
Search for a list of movie titles
Pros
Language-Independent
Will find result even in case of misspelling
Β
Cons
On large text size can be less precise than Full-Text Search
How does it work?
Spaces are prepended and appended to the string
The string is dived in group of 3 characters (trigrams)
The list of trigram is filtered from duplication and ordered
How does it work?
Spaces are prepended and appended to the string
The string is dived in group of 3 characters (trigrams)
The list of trigram is filtered from duplication and ordered
Misspells
Good use case
Search in a list of unique words
(names, movie titles, song titles, band names)
Bad use case
Search in the articles' content
Standard textual queries π |
βExternal services π |
Postgre Full-Text Search π |
Postgre Trigram π |