Clearing the Fuzzies on Fuzzy Search

About Me

  • Software Engineer
  • 3-year Ortusian
  • Central NY
  • Lover of all things open source

Michael Born

@michaelborn_me

This Talk

  • intro to fuzzy search
  • theory of indexing
  • theory of text analysis
  • delve into the practical
    • cbElasticsearch installation
    • indexing documents on app start
    • configuring fuzzy search

What is Fuzzy Search?

What is Fuzzy Search?

the process of analyzing documents and search queries to find relevant results for the searcher's intent

the opposite of exact match searching

OR

What is Fuzzy Search?

hptdog

hotdog

no results found for "hptdog".

A hot dog[1][2] (less commonly spelled hotdog[3]) is a food consisting of a grilled or steamed sausage served in the slit of a partially sliced bun.

Indexing Theory

Concordance

  • an index
  • of terms
  • of at least 3 letters
  • and omitting common words

Concordance

for

lord

god

your

who

hold

takes

your

says

right

hand

will

fear

help

you

For I am the Lord your God
who takes hold of your right hand
and says to you, Do not fear;
I will help you.

- Isaiah 41:13

Text Analysis

Text Analysis

Analyzer

  1. Filters
  2. Tokenizers

Text Analysis: Filters

  • take some input
  • give some output
  • character filters
  • token filters

Character Filters

Examples:

  • lowercase filter
  • HTML filter

Token Filters

Examples:

  • stopword
  • synonym
  • phonetic
  • stemmer

Text Analysis: Tokenizers

Tokenizers “split” a piece of text into discrete indexable chunks

Examples

  • standard
    • splits on encountering whitespace
    • splits on encountering punctuation
  • whitespace
    • splits on encountering whitespace

Text Analysis: Analyzers

 A configuration package including filters and tokenizers

Indexing in Elasticsearch

By default:

  • all terms (tokens) included
  • no stopwords
  • little to no manipulation of tokens
  • no stemming
  • no synonyms
  • no fun 😢

Demo: Index Tuning

  1. "vanilla" index
  2. tuned index

Fuzzy Theory

Fuzzy Theory

hptdog

hotdog

no results found for "hptdog".

A hot dog[1][2] (less commonly spelled hotdog[3]) is a food consisting of a grilled or steamed sausage served in the slit of a partially sliced bun.

Fuzzy Theory

hptdog

hotdog

no results found for "hptdog".

A hot dog[1][2] (less commonly spelled hotdog[3]) is a food consisting of a grilled or steamed sausage served in the slit of a partially sliced bun.

Levenshtein Distance

A metric for calculating the similarity of two strings

Levenshtein Distance

coat

  • substitution
    • boat
    • cost
  • deletion
    • cat
    • cot
  • insertion
    • coats

Fuzzy Query

getInstance( "SearchBuilder@CbElasticSearch" )
    .new( "reviews" )
    .setQuery({
        "fuzzy": {
            "text": {
                "value": event.getValue( "query" ),
                "fuzziness" : "1"
            }
        }
    })
    .execute();
getInstance( "SearchBuilder@CbElasticSearch" )
    .new( "reviews" )
    .setQuery({
        "fuzzy": {
            "text": {
                "value": event.getValue( "query" ),
                "fuzziness" : "AUTO"
            }
        }
    })
    .execute();

Demo: Fuzzy search

  1. searching with a "match" query
  2. searching with a "fuzzy" query

Conclusion

  1. Elasticsearch offers powerful fuzzy search capabilities...
  2. but very little is enabled by default.
  3. Build custom analyzers to fine-tune search results
  4. Use fuzzy queries carefully

Thank You!

Resources

  • https://typesense.org/learn/fuzzy-search/
  • https://solr.apache.org/guide/solr/latest/indexing-guide/document-analysis.html
Made with Slides.com