paws on Elasticsearch
15.12.2017
torsti @ Wunderdog
Elasticsearch is Good Fulltext search infra
- Wikipedia
- The Guardian
- StackOverflow
- GitHub
- many others
powered by Apache Lucene
lucene
queries and index
elastisearch
restful interface
scale
word on host es
you can get things done using
AWS Elasticsearch domains
elastic co Elastic Cloud on AWS or GCP
you may never hit the limitations described in
Index
inverted index
tokens point to the document
Query
- make tokens (by some process)
- match tokens against all known tokens
- return documents where tokens match (by some algorithm)
Tf-idf roughly
Term frequency
how often does this term appear in this document
Inverse document frequency
how often does this term appear in all documents
stop words
a stop word is a word that doesn't get indexed even if it's in the document because it's so common
the idf is very small