Cómo buscar pienso para tu gato con MongoDB Atlas Search

Clara Jiménez Recio

Clara Jiménez Recio

¡Hola! 👋🏻

Free-time Alexa Skills Developer 🤓

Full Stack Developer & Lover 👩🏻‍💻❤️

Alexa Champion 🏆

Clara Jiménez Recio

¡Hola! 👋🏻

Free-time Alexa Skills Developer 🤓

Full Stack Developer & Lover 👩🏻‍💻❤️

Alexa Champion 🏆

2

3

Search Index

Analyzers

1

MongoDB Atlas Search

4

Search Query

5

Autocomplete

6

Hacks

2

3

Search Index

Analyzers

1

MongoDB Atlas Search

4

Search Query

5

Autocomplete

6

Hacks

7

Vector Search + LLMs

1

MongoDB Atlas Search

  • Fuzzy Searching
  • Synonyms
  • Rich Queries
  • Autocomplete

1

MongoDB Atlas Search

  • Apache Lucene
  • Fuzzy Searching
  • Synonyms
  • Rich Queries
  • Autocomplete

1

MongoDB Atlas Search

  • Fuzzy Searching
  • Synonyms
  • Custom Scoring
  • Rich Queries
  • Autocomplete

2

Search Index

{
  "name": {
    "type": "String"
  }
}
{
  "name": {
    "type": ["String"]
  }
}
{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": {
        "type": "string"
      }
    }
  }
}

string, array of strings

2

Search Index

{
  "translations": [
    {
      "lang": {
        "type": "String"
      },
      "name": {
        "type": "String"
      }
    }
  ]
}
{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "dynamic": false,
        "type": "embeddedDocuments",
        "fields": {
          "name": {
            "type": "string"
          }
        }
      }
    }
  }
}

array of objects

2

Search Index

{
  "translations": [
    {
      "lang": {
        "type": "String"
      },
      "name": {
        "type": "String"
      }
    }
  ]
}

array of objects

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "dynamic": false,
        "type": "embeddedDocuments",
        "fields": {
          "name": {
            "type": "string"
          }
        }
      }
    }
  }
}
{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "dynamic": false,
        "type": "embeddedDocuments",
        "fields": {
          "name": {
            "type": "string"
          },
          "lang": {
            "type": "string"
          }
        }
      }
    }
  }
}

2

Search Index

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "type": "document",
        "fields": {
          "es": {
            "type": "document",
            "fields": {
              "name": {
                "type": "string"
              }
            }
          },
          "en": {
            "type": "document",
            "fields": {
              "name": {
                "type": "string"
              }
            }
          }
          ...
        }
      }
    }
  }
}
{
  "translations": {
    "es": {
      "name": {
        "type": "String"
      }
    },
    "en": {
      "name": {
        "type": "String"
      }
    }
    ...
  }
}

dictionary

3

Analyzers

pienso

gatos

esterilizados

pienso

especial

para

gatos

esterilizados

🔎 Pienso gatos esterilizados

{
  "name": "Pienso especial para gatos ESTERILIZADOS"
}

Analyzer

Analyzer

Tokens

🕵️‍♀️

🕵️‍♀️

3

Analyzers

Analyzer Separator Transformation Case sensitive Only exact matches
Standard word boundaries (language-neutral) lowercase No No
Simple non-letter characters lowercase No No
Whitespace whitespaces none Yes No
Keyword none none Yes Yes
  • Language Analyzers
  • Custom Analyzers (stemming, stopwords,...)

3

Analyzers

Name Standard Simple Whitespace Keyword
Applaws applaws applaws Applaws Applaws
True Origins true, origins true, origins True, Origins True Origins
Forza 10 forza, 10 forza Forza, 10 Forza 10
Nature's Variety nature's, variety nature, s, variety Nature's, Variety Nature's Variety

Tokens 🇦🇧🇨

3

Analyzers

Analyzer Tokens Matches
Standard applaws
Simple applaws
Whitespace Applaws
Keyword Applaws

🔎 Applaws

3

Analyzers

Analyzer Tokens Matches
Standard applaws
Simple applaws
Whitespace Applaws
Keyword Applaws

🔎 applaws

3

Analyzers

Analyzer Tokens Matches
Standard true, origins
Simple true, origins
Whitespace True, Origins
Keyword True Origins

🔎 True Origins

3

Analyzers

Analyzer Tokens Matches
Standard true, origins
Simple true, origins
Whitespace True, Origins
Keyword True Origins

🔎 True Origins Wild

3

Analyzers

Analyzer Tokens Matches
Standard nature's, variety
Simple nature, s, variety
Whitespace Nature's, Variety
Keyword Nature's Variety

🔎 Nature's Variety

3

Analyzers

Analyzer Tokens Matches
Standard nature's, variety
Simple nature, s, variety
Whitespace Nature's, Variety
Keyword Nature's Variety

🔎 Nature's

3

Analyzers

🔎 Forza 10

Analyzer Tokens Matches
Standard forza, 10
Simple forza
Whitespace Forza, 10
Keyword Forza 10

3

Analyzers

🔎 10

Analyzer Tokens Matches
Standard forza, 10
Simple forza
Whitespace Forza, 10
Keyword Forza 10

4

Search Query

Operators:

  • text
  • phrase
  • compound
  • embeddedDocument
  • autocomplete
{
  "$search": {
    "text": {
      "query": "Pienso",
      "path": ["name", "description"],
      "fuzzy": {
        "maxEdits": 1,
        "maxExpansions": 10,
        "prefixLength": 0
      }
    }
  }
}
{
  "$search": {
    "phrase": {
      "query": "Pienso gatos esterilizados",
      "path": ["name", "description"],
      "slop": 2
    }
  }
}
{
  "$search": {
    "compound": {
      "must": [
        {
          "text": {
            "query": "Pienso",
            "path": "name",
            "fuzzy": {
              "maxEdits": 1,
              "maxExpansions": 10,
              "prefixLength": 0
            }
          }
        }
      ],
      "should": [
        {
          "phrase": {
            "query": "Pienso gatos esterilizados",
            "path": "description",
            "slop": 2
          }
        }
      ]
    }
  }
}
{
  "$search": {
    "autocomplete": {
      "query": "Pien",
      "path": "name",
      "fuzzy": {
        "maxEdits": 1,
        "maxExpansions": 10,
        "prefixLength": 2
      }
    }
  }
}
{
  "$search": {
    "embeddedDocument": {
      "path": "translations",
      "operator": {
        "text": {
          "path": "translations.name",
          "query": "Pienso"
        }
      }
    }
  }
}

5

Autocomplete

  • Check if a word or phrase contains a sequence of characters from an
    incomplete input string
  • Search-as-you-type applications

5

Autocomplete

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "type": "autocomplete",
          "tokenization": "edgeGram",
          "minGrams": 2,
          "maxGrams": 8,
          "foldDiacritics": true
        }
      ]
    }
  }
}

Autocomplete Index (string, array of strings)

5

Autocomplete

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "type": "autocomplete",
          "tokenization": "edgeGram",
          "minGrams": 2,
          "maxGrams": 8,
          "foldDiacritics": true
        }
      ]
    }
  }
}

standard
pienso, para, gatos

edgeGram

pi, pie, pien, piens, pienso, pienso[space], pienso p, pa, par, para, para[space], para g, para ga, para gat, ga, gat, gato, gatos

rightEdgeGram

os, tos, atos, gatos, [space]gatos, a gatos, ra gatos, ra, ara, para, [space]para, o para, so para, nso para, so, nso, enso, ienso, pienso

nGram

pi, pie, pien, piens, pienso, pienso[space], pienso p, ie, ien, iens, ienso, ienso[space], ienso p, ienso pa, en, ens, enso, enso[space], enso p, enso pa, enso par, ns, nso, nso[space], nso p, nso pa, nso par, nso para,... 😵

Autocomplete Index (string, array of strings)

5

Autocomplete

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "dynamic": false,
        "type": "embeddedDocuments",
        "fields": {
          "name": {
            "type": "autocomplete",
            "tokenization": "edgeGram",
            "minGrams": 2,
            "maxGrams": 8,
            "foldDiacritics": true
          }
        }
      }
    }
  }
}

Autocomplete Index (array of objects)

5

Autocomplete

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "type": "document",
        "fields": {
          "es": {
            "type": "document",
            "fields": {
              "name": {
                "type": "autocomplete",
                "tokenization": "edgeGram",
                "minGrams": 2,
                "maxGrams": 8,
                "foldDiacritics": true
              }
            }
          },
          ...
        }
      }
    }
  }
}

Autocomplete Index (dictionary)

🇪🇸🇫🇷🇩🇪🇺🇲🏴󠁧󠁢󠁥󠁮󠁧󠁿󠁧󠁢󠁥󠁮󠁧󠁿🇵🇹🇳🇱🇨🇳

😶‍🌫️

5

Autocomplete

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "type": "document",
        "fields": {
          "es": {
            "type": "document",
            "fields": {
              "name": {
                "type": "autocomplete",
                "tokenization": "edgeGram",
                "minGrams": 2,
                "maxGrams": 8,
                "foldDiacritics": true
              }
            }
          },
          ...
        }
      }
    }
  }
}

Autocomplete Index (dictionary)

🇪🇸🇫🇷🇩🇪🇺🇲🏴󠁧󠁢󠁥󠁮󠁧󠁿󠁧󠁢󠁥󠁮󠁧󠁿🇵🇹🇳🇱🇨🇳

😶‍🌫️

{
  "$search": {
    "autocomplete": {
      "query": "Pien",
      "path": ["translations.es.name", "translations.en.name", ...]
    }
  }
}

compound 😵‍💫

5

Autocomplete

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "translations": {
        "type": "document",
        "fields": {
          "es": {
            "type": "document",
            "fields": {
              "name": {
                "type": "autocomplete",
                "tokenization": "edgeGram",
                "minGrams": 2,
                "maxGrams": 8,
                "foldDiacritics": true
              }
            }
          },
          ...
        }
      }
    }
  }
}

Autocomplete Index (dictionary)

{
  "$search": {
    "autocomplete": {
      "query": "Pien",
      "path": ["translations.es.name", "translations.en.name", ...]
    }
  }
}

compound 😵‍💫

5

Autocomplete

Name Standard Simple Whitespace Keyword
Applaws ap, app, appl, appla, applaw, applaws ap, app, appl, appla, applaw, applaws Ap, App, Appl, Appla, Applaw, Applaws Ap, App, Appl, Appla, Applaw, Applaws
True Origins tr, tru, true, true[space], true o, true or, true ori, or, ori, orig, origi, origin, origins tr, tru, true, true[space], true o, true or, true ori, or, ori, orig, origi, origin, origins Tr, Tru, True, True[space], True O, True Or, True Ori, Or, Ori, Orig, Origi, Origin, Origins Tr, Tru, True, True[space], True O, True Or, True Ori
Forza 10 fo, for, forz, forza, forza[space], forza 1, forza 10, 10 fo, for, forz, forza Fo, For, Forz, Forza, Forza[space], Forza 1, Forza 10, 10 Fo, For, Forz, Forza, Forza[space], Forza 1, Forza 10
Nature's Variety na, nat, natu, natur, nature, nature', nature's, va, var, vari, varie, variet, variety na, nat, natu, natur, nature, nature', nature's, va, var, vari, varie, variet, variety Na, Nat, Natu, Natur, Nature, Nature', Nature's, Va, Var, Vari, Varie, Variet, Variety Na, Nat, Natu, Natur, Nature, Nature', Nature's

Tokens 🇦🇧🇨

(usando edgeGram)

6

Hacks

{
  "$search": {
    "autocomplete": {
      "query": "Pienso gatos",
      "path": "name"
    }
  }
}
{
  "$search": {
    "autocomplete": {
      "query": "Pienso",
      "path": "name"
    }
  }
}

6

Hacks

{
  "$search": {
    "compound": {
      "must": [
        {
          "autocomplete": {
            "query": "Pienso",
            "path": "name"
          }
        },
        {
          "autocomplete": {
            "query": "gatos",
            "path": "name"
          }
        }
      ]
    }
  }
}
{
  $search: {
    autocomplete: {
      query: "Pienso gatos",
      path: "name",
      tokenOrder: "sequential"
    }
  }
}

7

MongoDB Atlas Vector Search

7

MongoDB Atlas Vector Search

Embedding

[0.9, 0.02, 0.1,...]

🔎 Question

[0.9, 0.08, 0.1,...]

Embedding

Answer

$vectorSearch

MongoDB Atlas

Store

1

2

3

4

5

5

6

7

Context Documents

Prompt

LLM

7

MongoDB Atlas Vector Search

[0.9, 0.02, 0.1,...]

🔎 Question

[0.9, 0.08, 0.1,...]

Answer

MongoDB Atlas

Prompt

LLM

Embedding

1

Store

2

Embedding

3

$vectorSearch

4

Retrieval-augmented generation (RAG)

6

7

5

5

Context Documents

7

Vector Search + LLMs

[0.9, 0.02, 0.1,...]

🔎 Question

[0.9, 0.08, 0.1,...]

Answer

MongoDB Atlas

Prompt

LLM

Embedding

1

Store

2

Embedding

3

$vectorSearch

4

Retrieval-augmented generation (RAG)

6

7

5

5

Context Documents

¿Qué pienso tiene más proteína de origen animal? También quiero que tenga l-carnitina y taurina

composition_emb
composition

Clara Jiménez Recio

@clear_is_me

(aka         )

clara-jr

clara-jr.github.io

Nita

Made with Slides.com