The Strings - How fuzzy are they?
Your boss comes by your desk and says: so product tells me our users can't find each other by name, so we gotta add a search box or something. Get to work!
(Your users, probably)
Fine, you get Googlin'
The anxiety when you realize the complexity
But then you remember that you're just trying to filter a table by one column, probably first or last_name.
(You, probably)
(Totally made up example, definitely didn't happen in real life)
Meet your new friends
Soundex
Metaphone
Double Metaphone
Daitch Mokotoff
(They're probably from Sweden)
CREATE EXTENSION fuzzystrmatch;
Some Test Datazzz
CREATE TABLE users (
id SERIAL PRIMARY KEY
fname TEXT NOT NULL,
lname TEXT NOT NULL);
INSERT INTO users (fname, lname) VALUES
('James', 'Smith'), ('David', 'Anderson'),
('Christopher', 'Clark'), ('George', 'Wright'),
('Ronald', 'Mitchell'), ('John', 'Johnson'),
('Richard', 'Thomas'), ('Daniel', 'Rodriguez'),
('Kenneth', 'Lopez'), ('Anthony', 'Perez'),
('Robert', 'Williams'), ('Charles', 'Jackson'),
('Paul', 'Lewis'), ('Steven', 'Hill'),
('Kevin', 'Roberts'), ('Michael', 'Jones'),
('Joseph', 'White'), ('Mark', 'Lee'),
('Edward', 'Scott'), ('Jason', 'Turner'),
('William', 'Brown'), ('Thomas', 'Harris'),
('Donald', 'Walker'), ('Brian', 'Green'),
('Jeff', 'Phillips'), ('Mary', 'Davis'),
('Jennifer', 'Martin'), ('Lisa', 'Hall'),
('Sandra', 'Adams'), ('Michelle', 'Campbell'),
('Patricia', 'Miller'), ('Maria', 'Thompson'),
('Nancy', 'Allen'), ('Donna', 'Baker'),
('Laura', 'Parker'), ('Linda', 'Wilson'),
('Susan', 'Garcia'), ('Karen', 'Young'),
('Carol', 'Gonzalez'), ('Sarah', 'Evans'),
('Barbara', 'Moore'), ('Margaret', 'Martinez'),
('Betty', 'Hernandez'), ('Ruth', 'Nelson'),
('Kimberly', 'Edwards'), ('Elizabeth', 'Taylor'),
('Dorothy', 'Robinson'), ('Helen', 'King'),
('Sharon', 'Carter'), ('Deborah', 'Collins'),
('Jürgen', 'Schnitzel'), ('Dāwiḏ', 'Kabbabi'),
('Heiðar', 'Helguson'), ('Boglárka', 'Ladke');
Soundex
SELECT
*
FROM
users
WHERE
difference(fname, 'Lunda') > 2;
| id | fname | lname |
|----|---------|--------|
| 9 | Kenneth | Lopez |
| 36 | Linda | Wilson |
SELECT
*
FROM
users
WHERE
difference (fname, 'Jam') > 2;
| id | fname | lname |
|----|-------|----------|
| 1 | James | Smith |
| 6 | John | Johnson |
| 25 | Jeff | Phillips |
| 34 | Donna | Baker |
Good-ish for English, bad for everyone else
Metaphone
SELECT metaphone('Jim Thebob', 4);
SELECT metaphone('Jorgen McJorgensen', 10);
-- metaphone
-- -----------
-- JM0BB
-- JRJNMKJRJN
Double Metaphone
SELECT
*
FROM
users
WHERE
levenshtein (dmetaphone_alt (fname), dmetaphone_alt ('David')) < 2
| id | fname | lname |
|----|---------|----------|
| 2 | David | Anderson |
| 47 | Dorothy | Robinson |
Daitch-Mokotoff Soundex
SELECT daitch_mokotoff('George');
-- daitch_mokotoff
-----------------
-- {595000}
SELECT daitch_mokotoff('John');
-- daitch_mokotoff
-- -----------------
-- {160000,460000}
SELECT daitch_mokotoff('Bierschbach');
-- daitch_mokotoff
-- -----------------------------------------------------------
-- {794575,794574,794750,794740,745750,745740,747500,747400}
SELECT daitch_mokotoff('Schwartzenegger');
-- daitch_mokotoff
-- -----------------
-- {479465}
(Postgresql 16+ only)
Strings - How fuzzy are they?
By signupskm
Strings - How fuzzy are they?
- 57