PostgreSQL fuzzystrmatch — Phonetic & Fuzzy Matching

Complete reference for PostgreSQL fuzzystrmatch extension functions covering Soundex, Metaphone, Double Metaphone, and Levenshtein edit distance for phonetic and fuzzy string matching. Every function includes syntax, real-world deduplication examples, and performance notes. Updated for PostgreSQL 16.

6 functions

What are PostgreSQL fuzzystrmatch — Phonetic & Fuzzy Matching?

fuzzystrmatch is a PostgreSQL extension providing algorithms for measuring string similarity and phonetic equivalence. soundex() and metaphone() return phonetic codes for matching words that sound alike, while levenshtein() computes the edit distance (minimum number of single-character insertions, deletions, or substitutions) between two strings. These functions are widely used for deduplication, spell-checking suggestions, and name matching in search applications.

soundex

PG 8.0+text

Converts a name to its Soundex code — a 4-character code representing the phonetic pronunciation. Requires the fuzzystrmatch extension.

DeveloperData Eng

Signature

soundex ( text ) → text

Parameters

ParameterTypeDescription
stringtextName or word to encode

Examples

sql
SELECT soundex('hello');
H400
sql
SELECT soundex('Anne'), soundex('Ann');
A500 | A500 (same code)
sql
SELECT * FROM people WHERE soundex(last_name) = soundex('Smith');
People with names sounding like Smith
sql
SELECT last_name, soundex(last_name) AS code FROM customers GROUP BY last_name, soundex(last_name) HAVING COUNT(*) > 1 ORDER BY code;
Clusters of last names sharing a Soundex code
Anti-Pattern

Soundex groups names by sound: 'Smith', 'Smyth', 'Smythe' all map to S530. Use `soundex(name1) = soundex(name2)` to find likely duplicates in customer databases.

example
SELECT a.id, b.id, a.last_name, b.last_name FROM customers a JOIN customers b ON a.id < b.id AND soundex(a.last_name) = soundex(b.last_name) AND a.first_name ILIKE b.first_name;
Potential duplicate customer records

difference

PG 8.0+integer

Returns the number of matching Soundex codes between two strings (0–4). Higher is more similar.

DeveloperData Eng

Signature

difference ( text, text ) → integer

Parameters

ParameterTypeDescription
string1textFirst string
string2textSecond string

Examples

sql
SELECT difference('hello', 'hello');
4 (exact match)
sql
SELECT difference('Anne', 'Ann');
4
sql
SELECT difference('hello', 'world');
2 or less
sql
SELECT name, difference(name, 'Johnson') AS score FROM contacts WHERE difference(name, 'Johnson') >= 3 ORDER BY score DESC;
Contacts with high phonetic similarity to Johnson
Anti-Pattern

A difference of 4 means identical Soundex codes. Use `difference(a, b) >= 3` for fuzzy phonetic matching. Combined with Levenshtein for typo tolerance.

example
SELECT name FROM users WHERE difference(name, $1) >= 3;
Names with high phonetic similarity

metaphone

PG 8.0+text

Returns the Metaphone code for a string — a phonetic encoding more accurate than Soundex for English words.

DeveloperData Eng

Signature

metaphone ( text, max_output_length integer ) → text

Parameters

ParameterTypeDescription
stringtextWord or name to encode
max_lengthintegerMaximum length of the output code

Examples

sql
SELECT metaphone('Smith', 8);
SM0
sql
SELECT metaphone('Schmidt', 8);
SXMT
sql
SELECT * FROM names WHERE metaphone(name, 10) = metaphone('Catherine', 10);
Names sounding like Catherine
Anti-Pattern

Metaphone produces more discriminating codes than Soundex — it better handles consonant clusters, silent letters, and spelling variations. Use metaphone for English name matching; Soundex for broader matching.

example
SELECT * FROM contacts WHERE metaphone(first_name, 6) = metaphone($1, 6) AND metaphone(last_name, 6) = metaphone($2, 6);
Contacts with phonetically matching names

dmetaphone

PG 8.0+text

Returns the Double Metaphone code for a string. Double Metaphone provides two alternative phonetic codes to handle different pronunciation conventions.

DeveloperData Eng

Signatures

dmetaphone ( text ) → text
dmetaphone_alt ( text ) → text

Parameters

ParameterTypeDescription
stringtextWord or name to encode

Examples

sql
SELECT dmetaphone('Schmidt'), dmetaphone_alt('Schmidt');
XMT | SMT
sql
SELECT * FROM names WHERE dmetaphone(name) = dmetaphone('Garcia') OR dmetaphone_alt(name) = dmetaphone('Garcia');
Names phonetically similar to Garcia
sql
SELECT dmetaphone('Catherine'), dmetaphone_alt('Catherine');
K0RN | KTRN
sql
SELECT name FROM voters WHERE dmetaphone(last_name) = dmetaphone($1) OR dmetaphone_alt(last_name) = dmetaphone_alt($1) ORDER BY name;
Voter records with phonetically similar last names (both encodings checked)
Anti-Pattern

Double Metaphone generates a primary code for English pronunciation and an alternate for Germanic/European conventions. Match against both: `dmetaphone(name) = dmetaphone($1) OR dmetaphone_alt(name) = dmetaphone($1)` for broader coverage.

example
SELECT name FROM customers WHERE dmetaphone(last_name) IN (dmetaphone($1), dmetaphone_alt($1));
All phonetically similar last names

levenshtein

PG 8.0+integer

Computes the Levenshtein edit distance — the minimum number of insertions, deletions, and substitutions needed to transform source into target.

DeveloperData Eng

Signatures

levenshtein ( source text, target text ) → integer
levenshtein ( source text, target text, ins_cost integer, del_cost integer, sub_cost integer ) → integer

Parameters

ParameterTypeDescription
sourcetextSource string
targettextTarget string
ins_costintegerCost of insertion (default 1)
del_costintegerCost of deletion (default 1)
sub_costintegerCost of substitution (default 1)

Examples

sql
SELECT levenshtein('kitten', 'sitting');
3
sql
SELECT levenshtein('hello', 'hello');
0
sql
SELECT word FROM dictionary WHERE levenshtein(word, 'receieve') <= 2 ORDER BY levenshtein(word, 'receieve') LIMIT 5;
Spell-check suggestions
sql
SELECT levenshtein('color', 'colour', 1, 1, 2);
2 (substitution costs 2, making deletion+insertion preferred)
Anti-Pattern

Levenshtein requires a full table scan. Pre-filter with a trigram GIN index (`col % query` or `col <-> query`) to get a small candidate set, then rank by Levenshtein for precision.

example
SELECT word, levenshtein(word, $1) AS dist FROM dictionary WHERE word % $1 ORDER BY dist LIMIT 10;
Fast spell-check: trigram pre-filter + Levenshtein ranking

levenshtein_less_equal

PG 8.0+integer

Calculates Levenshtein distance but returns max_d + 1 early if the distance would exceed max_d. More efficient when you only need to check if distance ≤ threshold.

DeveloperData Eng

Signature

levenshtein_less_equal ( source text, target text, max_d integer ) → integer

Parameters

ParameterTypeDescription
sourcetextSource string
targettextTarget string
max_dintegerMaximum allowed distance. Returns max_d+1 if actual distance exceeds this.

Examples

sql
SELECT levenshtein_less_equal('hello', 'world', 3);
4 (exceeds threshold)
sql
SELECT * FROM words WHERE levenshtein_less_equal(word, $1, 2) <= 2;
Words within edit distance 2
sql
SELECT product_name FROM products WHERE levenshtein_less_equal(lower(product_name), lower($1), 1) <= 1;
Product names with at most 1 edit from the search term (case-insensitive)
sql
SELECT username, levenshtein_less_equal(username, $1, 2) AS dist FROM accounts WHERE levenshtein_less_equal(username, $1, 2) <= 2 ORDER BY dist;
Accounts with usernames close to the given input — useful for detecting squatted or typo usernames
Anti-Pattern

When you only care whether distance is ≤ N (not the exact value), `levenshtein_less_equal(a, b, N) <= N` is faster than `levenshtein(a, b) <= N` because it stops computing once it knows the threshold is exceeded.

example
SELECT * FROM usernames WHERE levenshtein_less_equal(username, $1, 1) <= 1;
Usernames with at most 1 typo from input