≈

PostgreSQL fuzzystrmatch — Phonetic & Fuzzy Matching

Complete reference for PostgreSQL fuzzystrmatch extension functions covering Soundex, Metaphone, Double Metaphone, and Levenshtein edit distance for phonetic and fuzzy string matching. Every function includes syntax, real-world deduplication examples, and performance notes. Updated for PostgreSQL 16.

6 functions

Jump to function

difference dmetaphone levenshtein levenshtein_less_equal metaphone soundex

What are PostgreSQL fuzzystrmatch — Phonetic & Fuzzy Matching?

fuzzystrmatch is a PostgreSQL extension providing algorithms for measuring string similarity and phonetic equivalence. soundex() and metaphone() return phonetic codes for matching words that sound alike, while levenshtein() computes the edit distance (minimum number of single-character insertions, deletions, or substitutions) between two strings. These functions are widely used for deduplication, spell-checking suggestions, and name matching in search applications.

difference

PG 8.0+→ integer

Returns the number of matching Soundex codes between two strings (0–4). Higher is more similar.

DeveloperData Eng

Signature

difference ( text, text ) → integer

Parameters

Parameter	Type	Description
string1	text	First string
string2	text	Second string

Examples

sql

SELECT difference('hello', 'hello');

→4 (exact match)

sql

SELECT difference('Anne', 'Ann');

→4

sql

SELECT difference('hello', 'world');

→2 or less

sql

SELECT name, difference(name, 'Johnson') AS score FROM contacts WHERE difference(name, 'Johnson') >= 3 ORDER BY score DESC;

→Contacts with high phonetic similarity to Johnson

⚠Anti-Pattern— Using DIFFERENCE as a binary match/no-match gate

DIFFERENCE() returns 0-4 but the meaningful threshold varies by name length and origin. Using a fixed cutoff (e.g. >= 3) will miss valid matches for longer names and produce false positives for short names.

✓ Instead: Use difference() as one signal among several (combine with levenshtein and similarity) rather than a standalone match decision.

A difference of 4 means identical Soundex codes. Use `difference(a, b) >= 3` for fuzzy phonetic matching. Combined with Levenshtein for typo tolerance.

example

SELECT name FROM users WHERE difference(name, $1) >= 3;

→Names with high phonetic similarity

dmetaphone

PG 8.0+→ text

Returns the Double Metaphone code for a string. Double Metaphone provides two alternative phonetic codes to handle different pronunciation conventions.

DeveloperData Eng

Signatures

dmetaphone ( text ) → text

dmetaphone_alt ( text ) → text

Parameters

Parameter	Type	Description
string	text	Word or name to encode

Examples

sql

SELECT dmetaphone('Schmidt'), dmetaphone_alt('Schmidt');

→XMT | SMT

sql

SELECT * FROM names WHERE dmetaphone(name) = dmetaphone('Garcia') OR dmetaphone_alt(name) = dmetaphone('Garcia');

→Names phonetically similar to Garcia

sql

SELECT dmetaphone('Catherine'), dmetaphone_alt('Catherine');

→K0RN | KTRN

sql

SELECT name FROM voters WHERE dmetaphone(last_name) = dmetaphone($1) OR dmetaphone_alt(last_name) = dmetaphone_alt($1) ORDER BY name;

→Voter records with phonetically similar last names (both encodings checked)

⚠Anti-Pattern— Only checking dmetaphone() without dmetaphone_alt()

Double Metaphone generates a primary and an alternate pronunciation code. Checking only the primary code misses valid matches where the alternate code matches.

✓ Instead: Check both: dmetaphone(name) = dmetaphone(query) OR dmetaphone_alt(name) = dmetaphone_alt(query) for complete phonetic matching.

Double Metaphone generates a primary code for English pronunciation and an alternate for Germanic/European conventions. Match against both: `dmetaphone(name) = dmetaphone($1) OR dmetaphone_alt(name) = dmetaphone($1)` for broader coverage.

example

SELECT name FROM customers WHERE dmetaphone(last_name) IN (dmetaphone($1), dmetaphone_alt($1));

→All phonetically similar last names

levenshtein

PG 8.0+→ integer

Computes the Levenshtein edit distance — the minimum number of insertions, deletions, and substitutions needed to transform source into target.

DeveloperData Eng

Signatures

levenshtein ( source text, target text ) → integer

levenshtein ( source text, target text, ins_cost integer, del_cost integer, sub_cost integer ) → integer

Parameters

Parameter	Type	Description
source	text	Source string
target	text	Target string
ins_cost	integer	Cost of insertion (default 1)
del_cost	integer	Cost of deletion (default 1)
sub_cost	integer	Cost of substitution (default 1)

Examples

sql

SELECT levenshtein('kitten', 'sitting');

→3

sql

SELECT levenshtein('hello', 'hello');

→0

sql

SELECT word FROM dictionary WHERE levenshtein(word, 'receieve') <= 2 ORDER BY levenshtein(word, 'receieve') LIMIT 5;

→Spell-check suggestions

sql

SELECT levenshtein('color', 'colour', 1, 1, 2);

→2 (substitution costs 2, making deletion+insertion preferred)

⚠Anti-Pattern— Using levenshtein() on very long strings

levenshtein() has O(m×n) time and space complexity. On strings longer than 255 characters, PostgreSQL raises an error. Even on long strings near the limit, it is very slow.

✓ Instead: For long text, use trigram similarity (pg_trgm) which uses GIN/GiST indexes and scales much better. Reserve levenshtein for short strings like names, codes, and identifiers.

Levenshtein requires a full table scan. Pre-filter with a trigram GIN index (`col % query` or `col <-> query`) to get a small candidate set, then rank by Levenshtein for precision.

example

SELECT word, levenshtein(word, $1) AS dist FROM dictionary WHERE word % $1 ORDER BY dist LIMIT 10;

→Fast spell-check: trigram pre-filter + Levenshtein ranking

levenshtein_less_equal

PG 8.0+→ integer

Calculates Levenshtein distance but returns max_d + 1 early if the distance would exceed max_d. More efficient when you only need to check if distance ≤ threshold.

DeveloperData Eng

Signature

levenshtein_less_equal ( source text, target text, max_d integer ) → integer

Parameters

Parameter	Type	Description
source	text	Source string
target	text	Target string
max_d	integer	Maximum allowed distance. Returns max_d+1 if actual distance exceeds this.

Examples

sql

SELECT levenshtein_less_equal('hello', 'world', 3);

→4 (exceeds threshold)

sql

SELECT * FROM words WHERE levenshtein_less_equal(word, $1, 2) <= 2;

→Words within edit distance 2

sql

SELECT product_name FROM products WHERE levenshtein_less_equal(lower(product_name), lower($1), 1) <= 1;

→Product names with at most 1 edit from the search term (case-insensitive)

sql

SELECT username, levenshtein_less_equal(username, $1, 2) AS dist FROM accounts WHERE levenshtein_less_equal(username, $1, 2) <= 2 ORDER BY dist;

→Accounts with usernames close to the given input — useful for detecting squatted or typo usernames

⚠Anti-Pattern— Using levenshtein() when a distance threshold is all you need

levenshtein() always computes the full edit distance even when you only care whether distance <= k. For large datasets this wastes computation on pairs that are clearly too different.

✓ Instead: Use levenshtein_less_equal(s1, s2, k) which short-circuits computation as soon as the distance exceeds k, making it significantly faster for threshold checks.

When you only care whether distance is ≤ N (not the exact value), `levenshtein_less_equal(a, b, N) <= N` is faster than `levenshtein(a, b) <= N` because it stops computing once it knows the threshold is exceeded.

example

SELECT * FROM usernames WHERE levenshtein_less_equal(username, $1, 1) <= 1;

→Usernames with at most 1 typo from input

Common Gotchas

⚠

LIKE is case-sensitive; ILIKE is not — and LIKE is faster

LIKE 'hello%' will not match 'Hello'. Use ILIKE for case-insensitive pattern matching, but expect a performance cost.

⚠

COALESCE evaluates ALL arguments — not short-circuit like IF

COALESCE evaluates every argument before returning the first non-NULL. This means expensive functions or side-effectful calls on later arguments ARE executed.

⚠

Implicit type casts in indexes — your index may not be used

WHERE col = 5 may not use an index on col (text type) because the integer 5 is cast to text, preventing index use.

metaphone

PG 8.0+→ text

Returns the Metaphone code for a string — a phonetic encoding more accurate than Soundex for English words.

DeveloperData Eng

Signature

metaphone ( text, max_output_length integer ) → text

Parameters

Parameter	Type	Description
string	text	Word or name to encode
max_length	integer	Maximum length of the output code

Examples

sql

SELECT metaphone('Smith', 8);

→SM0

sql

SELECT metaphone('Schmidt', 8);

→SXMT

sql

SELECT * FROM names WHERE metaphone(name, 10) = metaphone('Catherine', 10);

→Names sounding like Catherine

Compare metaphone and Double Metaphone on common name

sql

SELECT dmetaphone('Thompson'), dmetaphone_alt('Thompson'), metaphone('Thompson', 10);

→TMPSN | TMPSN | TMPSN

⚠Anti-Pattern— Using metaphone() for names longer than 8 characters

metaphone() has a max output length parameter — if the output is too short it discards discriminating information for long compound names.

✓ Instead: Use dmetaphone() (Double Metaphone) which generates two alternative encodings for better accuracy on both primary and alternate pronunciations.

Metaphone produces more discriminating codes than Soundex — it better handles consonant clusters, silent letters, and spelling variations. Use metaphone for English name matching; Soundex for broader matching.

example

SELECT * FROM contacts WHERE metaphone(first_name, 6) = metaphone($1, 6) AND metaphone(last_name, 6) = metaphone($2, 6);

→Contacts with phonetically matching names

soundex

PG 8.0+→ text

Converts a name to its Soundex code — a 4-character code representing the phonetic pronunciation. Requires the fuzzystrmatch extension.

DeveloperData Eng

Signature

soundex ( text ) → text

Parameters

Parameter	Type	Description
string	text	Name or word to encode

Examples

sql

SELECT soundex('hello');

→H400

sql

SELECT soundex('Anne'), soundex('Ann');

→A500 | A500 (same code)

sql

SELECT * FROM people WHERE soundex(last_name) = soundex('Smith');

→People with names sounding like Smith

sql

SELECT last_name, soundex(last_name) AS code FROM customers GROUP BY last_name, soundex(last_name) HAVING COUNT(*) > 1 ORDER BY code;

→Clusters of last names sharing a Soundex code

⚠Anti-Pattern— Using SOUNDEX for multilingual name matching

SOUNDEX was designed for English names and performs poorly on non-English names, compound surnames, and names with non-ASCII characters. It reduces names to a 4-character code that collapses too many distinct names together.

✓ Instead: Use pg_trgm similarity() for flexible fuzzy matching, or metaphone() for better phonetic matching. For multilingual data, combine with unaccent extension.

Soundex groups names by sound: 'Smith', 'Smyth', 'Smythe' all map to S530. Use `soundex(name1) = soundex(name2)` to find likely duplicates in customer databases.

example

SELECT a.id, b.id, a.last_name, b.last_name FROM customers a JOIN customers b ON a.id < b.id AND soundex(a.last_name) = soundex(b.last_name) AND a.first_name ILIKE b.first_name;

→Potential duplicate customer records

PostgreSQL fuzzystrmatch — Phonetic & Fuzzy Matching

Jump to function

What are PostgreSQL fuzzystrmatch — Phonetic & Fuzzy Matching?

difference

Signature

Parameters

Examples

dmetaphone

Signatures

Parameters

Examples

levenshtein

Signatures

Parameters

Examples

levenshtein_less_equal

Signature

Parameters

Examples

Common Gotchas

metaphone

Signature

Parameters

Examples

soundex

Signature

Parameters

Examples

Related PostgreSQL Categories