Building a full-text search engine in TypeScript
Michele Riva
Michele Riva
Senior Software Architect @NearForm
Google Developer Expert
Microsoft MVP
MicheleRivaCode
MicheleRivaCode
Why?
MicheleRivaCode
MicheleRivaCode
What I cannot create, I do not understand
Richard Feynman
MicheleRivaCode
A journey through algorithms and data structures
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
There's no slow programming language, just bad DSA design
MicheleRivaCode
What is "full-text" search?
sybase.com
Full-text search is a more advanced way to search a database.
Full-text search quickly finds all instances of a term (word) in a table without having to scan rows and without having to know which column a term is stored in.
Full-text search works by using text indexes.
A text index stores positional information for all terms found in the columns you create the text index on.
MicheleRivaCode
What is "full-text" search?
sybase.com
Full-text search is a more advanced way to search a database.
Full-text search quickly finds all instances of a term (word) in a table without having to scan rows and without having to know which column a term is stored in.
Full-text search works by using text indexes.
A text index stores positional information for all terms found in the columns you create the text index on.
MicheleRivaCode
What is "full-text" search?
sybase.com
Full-text search is a more advanced way to search a database.
Full-text search quickly finds all instances of a term (word) in a table without having to scan rows and without having to know which column a term is stored in.
Full-text search works by using text indexes.
A text index stores positional information for all terms found in the columns you create the text index on.
MicheleRivaCode
What is "full-text" search?
sybase.com
Full-text search is a more advanced way to search a database.
Full-text search quickly finds all instances of a term (word) in a table without having to scan rows and without having to know which column a term is stored in.
Full-text search works by using text indexes.
A text index stores positional information for all terms found in the columns you create the text index on.
Popular full-text search engines
MicheleRivaCode
"New generation" full-text search engines
MicheleRivaCode
Sonic
Meilisearch
JavaScript-based full-text search engines
MicheleRivaCode
Lunr.js
MiniSearch
Fuse.js
MicheleRivaCode
Where to start?
MicheleRivaCode
Understand what kind of data we want to store and retrieve
MicheleRivaCode
[
{
"id": 1,
"quote": "It's alive! It's alive!",
"movie": "Frankenstein",
"year": 1931
},
{
"id": 2,
"quote": "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?",
"movie": "Dirty Harry",
"year": 1971
},
{
"id": 3,
"quote": "Mama always said life was like a box of chocolates. You never know what you're gonna get.",
"movie": "Forrest Gump",
"year": 1994
}
]
Example documents
MicheleRivaCode
// "It's alive! It's alive!"
["Its", "alive", "Its", "alive"]
// "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?"
[
"Youve", "got", "to", "ask", "yourself", "one", "question",
"Do", "I", "feel", "lucky", "Well", "do", "ya", "punk"
]
// "Mama always said life was like a box of chocolates. You never know what you're gonna get."
[
"Mama", "always", "said", "life", "was", "like", "a", "box", "of",
"chocolates", "You", "never", "know", "what", "youre", "gonna", "get"
]
Tokenizer
Break the sentences into individual tokens
MicheleRivaCode
// "It's alive! It's alive!"
["its", "alive", "its", "alive"]
// "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?"
[
"youve", "got", "to", "ask", "yourself", "one", "question",
"do", "i", "feel", "lucky", "well", "do", "ya", "punk"
]
// "Mama always said life was like a box of chocolates. You never know what you're gonna get."
[
"mama", "always", "said", "life", "was", "like", "a", "box", "of",
"chocolates", "you", "never", "know", "what", "youre", "gonna", "get"
]
Tokenizer
Lowercase all tokens
MicheleRivaCode
// "It's alive! It's alive!"
["its", "alive"]
// "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?"
[
"youve", "got", "to", "ask", "yourself", "one", "question",
"do", "i", "feel", "lucky", "well", "ya", "punk"
]
// "Mama always said life was like a box of chocolates. You never know what you're gonna get."
[
"mama", "always", "said", "life", "was", "like", "a", "box", "of",
"chocolates", "you", "never", "know", "what", "youre", "gonna", "get"
]
Tokenizer
Remove duplicates
MicheleRivaCode
// "It's alive! It's alive!"
["alive"]
// "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?"
[
"youve", /* "got", */ /* "to", */ "ask", "yourself", "one", "question",
/* "do", */ /* "i", */ "feel", "lucky", "well", "ya", "punk"
]
// "Mama always said life was like a box of chocolates. You never know what you're gonna get."
[
"mama", "always", "said", "life", /* "was", */, "like", /* "a", */ "box", /* "of", */
"chocolates", "you", "never", "know", /* "what", */ "youre", /* "gonna", */ "get"
]
Tokenizer
Remove stop-words*
MicheleRivaCode
What is a stop word?
Stop words are a set of commonly used words in a language. Examples of stop words in English are “a”, “the”, “is”, “are” and etc. Stop words are commonly used in Text Mining and Natural Language Processing (NLP) to eliminate words that are so commonly used that they carry very little useful information.
https://www.opinosis-analytics.com/knowledge-base/stop-words-explained/
MicheleRivaCode
// "It's alive! It's alive!"
["alive"]
// "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?"
[
"youve", /* "got", */ /* "to", */ "ask", "yourself", "one", "question",
/* "do", */ /* "i", */ "feel", "lucky", "well", "ya", "punk"
]
// "Mama always said life was like a box of chocolates. You never know what you're gonna get."
[
"mama", "always", "said", "life", /* "was", */, "like", /* "a", */ "box", /* "of", */
"chocolates", "you", "never", "know", /* "what", */ "youre", /* "gonna", */ "get"
]
Tokenizer
Remove stop-words*
MicheleRivaCode
// "It's alive! It's alive!"
["alive"]
// "You've got to ask yourself one question: 'Do I feel lucky?' Well, do ya, punk?"
[
"you" /* was "youve" */, "ask", "yourself", "one", "question",
"feel", "luck" /* was "lucky" */, "well", /* "ya" becomes "you", duplicate */ "punk"
]
// "Mama always said life was like a box of chocolates. You never know what you're gonna get."
[
"mom" /* was "mama" */, "always", "say" /* was "said" */, "life", "like", "box",
"chocolate" /* was "chocolates" */, "you", "never", "know", /*"you", was "youre", duplicate */, "get"
]
Tokenizer
Stemming*
MicheleRivaCode
Snowball
https://snowballstem.org
MicheleRivaCode
English 🇺🇸🇬🇧🇦🇺
http://snowball.tartarus.org/algorithms/english/stemmer.html
MicheleRivaCode
German 🇩🇪
http://snowball.tartarus.org/algorithms/german/stemmer.html
MicheleRivaCode
Italian 🇮🇹
http://snowball.tartarus.org/algorithms/italian/stemmer.html
MicheleRivaCode
Finnish 🇫🇮
http://snowball.tartarus.org/algorithms/finnish/stemmer.html
MicheleRivaCode
MicheleRivaCode
[
{
"id": 1,
"quote": ["alive"],
...
},
{
"id": 2,
"quote": ["you", "ask", "yourself", "one", "question", "feel", "luck", "well", "punk"],
...
},
{
"id": 3,
"quote": ["mom", "always", "say", "life", "like", "box", "chocolate", "you", "never", "know", "get"],
...
}
]
Final Result
Remaining tokens
MicheleRivaCode
How do we want to store this data?
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Find document containing the word "chocolate" in linear time
MicheleRivaCode
Time complexity is O(n)
MicheleRivaCode
MicheleRivaCode
animal = dog
book = algorithms to live by
color = green
language = javascript
city = florence
food = chocolate
HashMaps are used to store data in key-value pairs
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
function hash(key: string, size: number): number {
let hash = 0;
for (let i = 0; i < key.length; i++) {
let char = key[i];
hash = (hash << 5) + char.charCodeAt(0);
hash = (hash & hash) % size;
}
return hash;
}
Example of an hashing algorithm
MicheleRivaCode
function hash(key: string, size: number): number {
let hash = 0;
for (let i = 0; i < key.length; i++) {
let char = key[i];
hash = (hash << 5) + char.charCodeAt(0);
hash = (hash & hash) % size;
}
return hash;
}
const size = 10;
hash("food", size); // => 2
hash("book", size); // => 7
hash("hello, Berlin!", size); // => 9
Example of an hashing algorithm
MicheleRivaCode
When asking for a key, we know the exact position of its value inside of the array.
Hence, time complexity is O(1)
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
But that's not enough to find "chocolate" inside of our array of documents in O(1)
MicheleRivaCode
We need an inverted index
MicheleRivaCode
{
1 => ["alive"],
2 => ["you", "ask", "yourself", "one", "question", "feel", "luck", "well", "punk"],
3 => ["mom", "always", "say", "life", "like", "box", "chocolate", "you", "never", "know", "get"],
}
Regular HashMap
MicheleRivaCode
{
"alive" => [1],
"you" => [2, 3],
"ask" => [1],
"yourself" => [2],
"chocolate" => [3],
"punk" => [2],
"one" => [2],
"question" => [2],
"feel" => [2],
"mom" => [3],
"always" => [3],
"say" => [3],
"know" => [3],
"luck" => [2],
"life" => [3],
"like" => [3],
"well" => [2],
"box" => [3],
"never" => [3],
"get" => [3]
}
Inverted Index
MicheleRivaCode
Optimizing space
MicheleRivaCode
{
"intersect" => [10,32,12,2,3],
"interstellar" => [2,6,20,23,42],
"intergalactic" => [12,3,54,29,32],
"international" => [32,12,34,64,2],
"intervene" => [92,12,42,54,6],
"internal" => [102,32,543,6,1],
"telecommunication" => [91,2,4,23],
"television" => [10,8,6,15,3,2],
"telephone" => [1,85,14,54,76]
}
Many tokens are sharing a common prefix
MicheleRivaCode
Trees to the rescue!
MicheleRivaCode
MicheleRivaCode
Prefix tree
MicheleRivaCode
Private
Primark
Prime
Primate
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
MicheleRivaCode
We can use a prefix tree as an "inverted index" to store the reference of a token with the document
MicheleRivaCode
"primark" => [1, 3] "primate" => [2, 4] "prime" => [1, 5] "private" => [2, 6] "art" => [4, 5] "artist" => [4, 7]
MicheleRivaCode
"primark" => [1, 3] "primate" => [2, 4] "prime" => [1, 5] "private" => [2, 6] "art" => [4, 5] "artist" => [4, 7]
MicheleRivaCode
"Talk is cheap! Show me the code!"
MicheleRivaCode
type Nullable<T> = T | null;
type Children = Map<string, TrieNode>;
type Docs = Set<string>;
type NodeContent = [string, Docs];
interface ITrieNode {
key: string;
parent: Nullable<TrieNode>;
children: Nullable<Children>;
docs: Docs;
end: boolean;
getWord: () => NodeContent;
removeDoc: (id: string) => boolean;
}
trieNode.ts
MicheleRivaCode
type FindResult = {
[key: string]: Set<string>;
}
interface ITrie {
root: TrieNode;
insert: (word: string, docId: string) => void;
contains: (word: string) => boolean;
find: (prefix: string) => FindResult;
removeDocByWord: (word: string, docId: string) => boolean;
remove: (word: string) => boolean;
}
trie.ts
MicheleRivaCode
class TrieNode implements ITrieNode {
public key;
public parent = null;
public children = new Map();
public docs = new Set();
public end = false;
}
trieNode.ts
MicheleRivaCode
class TrieNode implements ITrieNode {
public key;
public parent = null;
public children = {};
public docs = new Set();
public end = false;
constructor(key: string) {
this.key = key;
}
}
trieNode.ts
MicheleRivaCode
class TrieNode implements ITrieNode {
public key;
public parent = null;
public children = {};
public docs = new Set();
public end = false;
constructor(key: string) {
this.key = key;
}
getWord(): NodeContent {
let node: TrieNode = this;
let output = "";
while (node !== null) {
output = node.key + output;
node = node.parent!;
}
return [output, this.docs];
}
}
trieNode.ts
MicheleRivaCode
MicheleRivaCode
class TrieNode implements ITrieNode {
public key;
public parent = null;
public children = {};
public docs = new Set();
public end = false;
constructor(key: string) {
this.key = key;
}
getWord() {
let output = "";
let node = this;
while (node !== null) {
output = node.key + output;
node = node.parent!;
}
return [output, this.docs];
}
removeDoc(docID: string): boolean {
return this.docs.delete(docID);
}
}
trieNode.ts
MicheleRivaCode
MicheleRivaCode
TC39 has standardized TCE
(tail-call elimination) with ES6
MicheleRivaCode
MicheleRivaCode
class Trie implements ITrie {
private root = new TrieNode("");
}
trie.ts
MicheleRivaCode
insert(word: string, docId: string): void {
const wordLength = word.length;
let node = this.root;
for (let i = 0; i < wordLength; i++) {
const char = word[i];
if (!node.children?.has(char)) {
const newTrieNode = new TrieNode(char);
newTrieNode.setParent(node);
node.children!.set(char, newTrieNode);
}
node = node.children!.get(char)!;
if (i === wordLength - 1) {
node.setEnd(true);
node.docs.add(docId);
}
}
}
trie.ts
find(prefix: string): FindResult {
let node = this.root;
const output: FindResult = {};
for (const char of prefix) {
if (node?.children?.has(char)) {
node = node.children.get(char)!;
} else {
return output;
}
}
findAllWords(node, output);
function findAllWords(_node: TrieNode, _output: FindResult) {
if (_node.end) {
const [word, docIDs] = _node.getWord();
if (!(word in _output)) {
_output[word] = new Set();
}
if (docIDs?.size) {
for (const doc of docIDs) {
_output[word].add(doc);
}
}
}
for (const childNode of _node.children?.values() ?? []) {
findAllWords(childNode, _output);
}
}
return output;
}
MicheleRivaCode
✅ Tokenizer
✅ Prefix-tree
❌ Typo-tolerance
MicheleRivaCode
trie.find("wrld");
// Resuls:
[
{
id: 1,
quote: "Hello, World!"
},
{
id: 2,
quote: "What a wonderful world"
}
]
Dynamic programming
MicheleRivaCode
Dynamic Programming
An algorithmic technique for solving an optimization problem by breaking it down into simpler subproblems and utilizing the fact that the optimal solution to the overall problem depends upon the optimal solution to its subproblems.
https://educative.io
MicheleRivaCode
Levenshtein distance
MicheleRivaCode
Levenshtein distance
MicheleRivaCode
The Levenshtein algorithm calculates the least number of edit operations that are necessary to modify one string to obtain another string.
MicheleRivaCode
const word1 = "moon";
const word2 = "lions";
levenshtein(word1, word2); // => 3
MicheleRivaCode
Allowed operations
Insert
Delete
Replace
MicheleRivaCode
Edit distance of "Moon" and "Lions"
1)
MOON
LIONS
REPLACE
MicheleRivaCode
Edit distance of "Moon" and "Lions"
1)
MOON
LIONS
REPLACE
2)
LOON
LIONS
REPLACE
MicheleRivaCode
Edit distance of "Moon" and "Lions"
1)
MOON
LIONS
REPLACE
2)
LOON
LIONS
REPLACE
3)
LION
LIONS
INSERT
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | ||||||
M | ||||||
O | ||||||
O | ||||||
N |
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | ||||||
M | ||||||
O | ||||||
O | ||||||
N |
Insert
Delete
Replace
MO -> L
1
2
3
4
5
6
2
3
4
1
D(2,2)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | ||||||
M | ||||||
O | ||||||
O | ||||||
N |
MOO -> O
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(4,3)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | |||||
M | ||||||
O | ||||||
O | ||||||
N |
"" -> ""
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | ||||
M | ||||||
O | ||||||
O | ||||||
N |
"" -> "L"
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | |||
M | ||||||
O | ||||||
O | ||||||
N |
"" -> "LI"
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | ||
M | ||||||
O | ||||||
O | ||||||
N |
"" -> "LIO"
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | |
M | ||||||
O | ||||||
O | ||||||
N |
"" -> "LION"
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | ||||||
O | ||||||
O | ||||||
N |
"" -> "LIONS"
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | ||||||
O | ||||||
N |
"M" -> ""
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | ||||||
N |
"MO" -> ""
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | 3 | |||||
N |
"MOO" -> ""
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
"MOON" -> ""
Insert
Delete
Replace
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(2,1)
MicheleRivaCode
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
+1
2
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(2,1)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
+1
2
+1
2
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(2,1)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | |||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
+1
2
+1
2
1
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(2,1)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | ||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | ||||
O | 2 | |||||
O | 3 | |||||
N | 4 |
+1
2
+1
2
2
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(3,1)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | |||||
O | 3 | |||||
N | 4 |
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | 2 | 2 | |||
O | 3 | |||||
N | 4 |
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(4,2)
MicheleRivaCode
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | 2 | 2 | |||
O | 3 | |||||
N | 4 |
2
2
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
D(4,2) = D(3,1)
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | 2 | 2 | 2 | 3 | 4 |
O | 3 | 3 | 3 | 2 | 3 | 4 |
N | 4 | 4 | 4 | 3 | 2 | 3 |
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | 2 | 2 | 2 | 3 | 4 |
O | 3 | 3 | 3 | 2 | 3 | 4 |
N | 4 | 4 | 4 | 3 | 2 |
3
Insert
Delete
Replace
1
2
3
4
5
6
2
3
4
1
MicheleRivaCode
Λ | L | I | O | N | S | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
M | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | 2 | 2 | 2 | 3 | 4 |
O | 3 | 3 | 3 | 2 | 3 | 4 |
N | 4 | 4 | 4 | 3 | 2 |
0
Insert
Delete
Replace
1
2
2
2
3
1
2
3
4
5
6
2
3
4
1
MicheleRivaCode
Edit distance of "Moon" and "Lions"
1)
MOON
LIONS
REPLACE
2)
LOON
LIONS
REPLACE
3)
LION
LIONS
INSERT
MicheleRivaCode
Λ | P | O | S | E | R | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
H | 1 | 1 | 2 | 3 | 4 | 5 |
O | 2 | 2 | 1 | 2 | 3 | 4 |
R | 3 | 3 | 2 | 2 | 3 | 3 |
S | 4 | 4 | 3 | 2 | 3 | 4 |
E | 5 | 5 | 4 | 3 | 2 |
Levenshtein distance of Horse - Poser
3
MicheleRivaCode
Levenshtein distance of Race - Raise
Λ | R | A | I | S | E | |
---|---|---|---|---|---|---|
Λ | 0 | 1 | 2 | 3 | 4 | 5 |
R | 1 | 0 | 1 | 2 | 3 | 4 |
I | 2 | 1 | 0 | 1 | 2 | 3 |
C | 3 | 2 | 1 | 1 | 2 | 3 |
E | 4 | 3 | 2 | 2 | 2 |
2
MicheleRivaCode
export function levenshtein(a: string, b: string): number {
if (!a.length) return b.length;
if (!b.length) return a.length;
let tmp;
if (a.length > b.length) {
tmp = a;
a = b;
b = tmp;
}
const row = Array.from({ length: a.length + 1 }, (_, i) => i);
let val = 0;
for (let i = 1; i <= b.length; i++) {
let prev = i;
for (let j = 1; j <= a.length; j++) {
if (b[i - 1] === a[j - 1]) {
val = row[j - 1];
} else {
val = Math.min(row[j - 1] + 1, Math.min(prev + 1, row[j] + 1));
}
row[j - 1] = prev;
prev = val;
}
row[a.length] = prev;
}
return row[a.length];
}
We can perform these operations on both strings and trees
MicheleRivaCode
Tree Edit Distance (and Levenshtein Distance)
Simple fast algorithms for the editing distance between trees and related problems
Kaizhong Zhang and Dennis Shasha
https://shorturl.at/otBMY
MicheleRivaCode
MicheleRivaCode
import { Lyra } from '@nearform/lyra';
const db = new Lyra({
schema: {
author: 'string',
quote: 'string'
}
});
MicheleRivaCode
await db.insert({
quote: 'It is during our darkest moments that we must focus to see the light.',
author: 'Aristotle'
});
await db.insert({
quote: 'If you really look closely, most overnight successes took a long time.',
author: 'Steve Jobs'
});
await db.insert({
quote: 'If you are not willing to risk the usual, you will have to settle for the ordinary.',
author: 'Jim Rohn'
});
await db.insert({
quote: 'You miss 100% of the shots you don\'t take',
author: 'Wayne Gretzky - Michael Scott'
});
MicheleRivaCode
const searchResult = await db.search({
term: 'if',
properties: ['quote']
});
// Result
{
elapsed: '99μs',
hits: [
{
id: 'ckAOPGTA5qLXx0MgNr1Zy',
quote: 'If you really look closely, most overnight successes took a long time.',
author: 'Steve Jobs'
},
{
id: 'fyl-_1veP78IO-wszP86Z',
quote: 'If you are not willing to risk the usual, you will have to settle for the ordinary.',
author: 'Jim Rohn'
}
],
count: 2
}
MicheleRivaCode
const searchResult = await db.search({
term: 'Michael',
properties: '*'
});
// Result
{
elapsed: '111μs',
hits: [
{
id: 'L1tpqQxc0c2djrSN2a6TJ',
quote: "You miss 100% of the shots you don't take",
author: 'Wayne Gretzky - Michael Scott'
}
],
count: 1
}
MicheleRivaCode
MicheleRivaCode
npm i @nearform/lyra
MicheleRivaCode
MicheleRivaCode
Real-World Next.js
Build scalable, high performances and modern web applications using Next.js, the React framework for production
MicheleRivaCode
MicheleRivaCode
@MicheleRiva
@MicheleRivaCode
/in/MicheleRiva95
www.micheleriva.dev
Building a full-text search engine in TypeScript
By Michele Riva
Building a full-text search engine in TypeScript
- 532