Elastic Search
Why do we need it?
Index columns in RDBMS
- Works well for exact match and starts with queries
- Usually implemented using BTree
- Visualisation Demo
SELECT *
FROM user
WHERE name = 'John Doe'
AND user_id = 21
AND birth_date > '2007-08-02'
- Search web pages with content on "blue sky"
- Search for "7th Sector, HSR layout, Bangalore" in a unstructured address registry
How about ?
Apache Lucene
a high-performance, full-featured text search engine
Inverted Index
- Break a document into tokens
- Index sorted set of tokens
- Map tokens to document and position of token within the document
What if we need
- A search for UK to match United Kingdom.
- A search for jump to match jumped, jumps and perhaps even leap.
- A search for johnny walker should match Johnnie Walker
- A search for fox news hunting should return stories about hunting on Fox News, while fox hunting news should return news stories about fox hunting.
Analysis
- Pre tokenisation filter : Convert "&" to and, Strip html characters etc
- Tokenisation
-
Post tokenisation filter
- Stemming: Convert "bikes" => "bike"
- Text Normalisation: Stripping accents etc
- Stop Words Filtering: Remove words like "the", "and" and "a"
- Synonym Expansion: Convert "UK" => "United Kingdom"
Scoring / Ranking Results
- Term Frequency: If a term appears more number of times in document, it is ranked better
- Inverse Document Frequency(IDF): If a term appears in fewer documents, documents containing these terms ranked better
- Boost: This is a parameter provided in the query
- Other factors...
Features
- Fuzzy search : Handle typos
- Phrase queries / proximity queries
- Highlight Searches
- Facet / Aggregations : Drill down results further (eg: e-commerce sites)
- Fielded search (Blog title, author, tags)
- Datatypes - Text, Numbers, Dates
- Dynamic Range : Geo location and distance filters
- More..
What does elastic search do?
- Cluster
- Nodes
- Shards : Primary(P) or Replica(R)
- Nodes
- Each shard is a lucene index
Elastic Search Architecture
API Demo
The End
Elastic Search
By Deepak Narayana Rao
Elastic Search
- 626