The Strings - How fuzzy are they?

Your boss comes by your desk and says: so product tells me our users can't find each other by name, so we gotta add a search box or something. Get to work!

(Your users, probably)

Fine, you get Googlin'

The anxiety when you realize the complexity

But then you remember that you're just trying to filter a table by one column, probably first or last_name.

(You, probably)

(Totally made up example, definitely didn't happen in real life)

Meet your new friends

Soundex

 Metaphone

 Double Metaphone

Daitch Mokotoff

(They're probably from Sweden)

CREATE EXTENSION fuzzystrmatch;

Some Test Datazzz

CREATE TABLE users (
id SERIAL PRIMARY KEY
fname TEXT NOT NULL,
lname TEXT NOT NULL);

INSERT INTO users (fname, lname) VALUES
('James', 'Smith'), ('David', 'Anderson'),
('Christopher', 'Clark'), ('George', 'Wright'),
('Ronald', 'Mitchell'), ('John', 'Johnson'),
('Richard', 'Thomas'), ('Daniel', 'Rodriguez'),
('Kenneth', 'Lopez'), ('Anthony', 'Perez'),
('Robert', 'Williams'), ('Charles', 'Jackson'),
('Paul', 'Lewis'), ('Steven', 'Hill'),
('Kevin', 'Roberts'), ('Michael', 'Jones'),
('Joseph', 'White'), ('Mark', 'Lee'),
('Edward', 'Scott'), ('Jason', 'Turner'),
('William', 'Brown'), ('Thomas', 'Harris'),
('Donald', 'Walker'), ('Brian', 'Green'),
('Jeff', 'Phillips'), ('Mary', 'Davis'),
('Jennifer', 'Martin'), ('Lisa', 'Hall'),
('Sandra', 'Adams'), ('Michelle', 'Campbell'),
('Patricia', 'Miller'), ('Maria', 'Thompson'),
('Nancy', 'Allen'), ('Donna', 'Baker'),
('Laura', 'Parker'), ('Linda', 'Wilson'),
('Susan', 'Garcia'), ('Karen', 'Young'),
('Carol', 'Gonzalez'), ('Sarah', 'Evans'),
('Barbara', 'Moore'), ('Margaret', 'Martinez'),
('Betty', 'Hernandez'), ('Ruth', 'Nelson'),
('Kimberly', 'Edwards'), ('Elizabeth', 'Taylor'),
('Dorothy', 'Robinson'), ('Helen', 'King'),
('Sharon', 'Carter'), ('Deborah', 'Collins'),
('Jürgen', 'Schnitzel'), ('Dāwiḏ', 'Kabbabi'),
('Heiðar', 'Helguson'), ('Boglárka', 'Ladke');

Soundex

SELECT
	*
FROM
	users
WHERE
	difference(fname, 'Lunda') > 2;
    
| id | fname   | lname  |
|----|---------|--------|
|  9 | Kenneth | Lopez  |
| 36 | Linda   | Wilson |


SELECT
	*
FROM
	users
WHERE
	difference (fname, 'Jam') > 2;
    
    
| id | fname | lname    |
|----|-------|----------|
|  1 | James | Smith    |
|  6 | John  | Johnson  |
| 25 | Jeff  | Phillips |
| 34 | Donna | Baker    |

Good-ish for English, bad for everyone else

Metaphone

SELECT metaphone('Jim Thebob', 4);
SELECT metaphone('Jorgen McJorgensen', 10);


-- metaphone
-- -----------
-- JM0BB
-- JRJNMKJRJN

Double Metaphone

SELECT
	*
FROM
	users
WHERE
	levenshtein (dmetaphone_alt (fname), dmetaphone_alt ('David')) < 2

| id | fname   | lname    |
|----|---------|----------|
|  2 | David   | Anderson |
| 47 | Dorothy | Robinson |

Daitch-Mokotoff Soundex

SELECT daitch_mokotoff('George');
-- daitch_mokotoff
-----------------
-- {595000}

SELECT daitch_mokotoff('John');
-- daitch_mokotoff
-- -----------------
-- {160000,460000}

SELECT daitch_mokotoff('Bierschbach');
--                      daitch_mokotoff
-- -----------------------------------------------------------
-- {794575,794574,794750,794740,745750,745740,747500,747400}

SELECT daitch_mokotoff('Schwartzenegger');
-- daitch_mokotoff
-- -----------------
-- {479465}

(Postgresql 16+ only)

Copy of JS 2023 - Retrospective

By signupskm

Copy of JS 2023 - Retrospective

  • 16