.*

PostgreSQL Regular Expression Functions

Complete reference for PostgreSQL regular expression functions covering pattern matching, text extraction, replacement, and string splitting. Every function includes syntax, regex pattern examples, POSIX vs SIMILAR TO differences, and performance notes. Updated for PostgreSQL 16.

10 functions

What are PostgreSQL Regular Expression Functions?

PostgreSQL regular expression functions are built-in functions for matching, extracting, and transforming text using POSIX regular expression patterns. PostgreSQL supports character classes, backreferences, non-greedy quantifiers, and named capture groups. Key PostgreSQL regex functions include regexp_match(), regexp_matches(), regexp_replace(), regexp_split_to_table(), and the ~ and ~* pattern match operators.

regexp_match

PG 10+text[]

Returns the first matching substring(s) for a POSIX regex pattern. Returns NULL if no match, or a text array of capture groups.

DeveloperData EngDBASecurity

Signature

regexp_match ( string text, pattern text [, flags text] ) → text[]

Parameters

ParameterTypeDescription
stringtextInput string to search
patterntextPOSIX regular expression
flagstextOptional flags: i=case-insensitive, g=global (use regexp_matches for multiple), n=newline-sensitive, s=dot matches newline

Examples

sql
SELECT regexp_match('abc123', '[0-9]+');
{123}
sql
SELECT regexp_match('user@example.com', '([^@]+)@(.+)');
{user,example.com}
sql
SELECT (regexp_match(log, 'duration: ([0-9.]+) ms'))[1]::numeric FROM logs;
Extracted duration value
sql
SELECT COALESCE((regexp_match(event_text, 'user_id=(\d+)'))[1], '0')::int AS user_id FROM events;
Parsed user_id integer, 0 if missing
Anti-PatternUsing regexp_matches when only one match is expected

regexp_matches returns SETOF text[], which means it acts as a set-returning function and can multiply rows in a SELECT list in unexpected ways. When you only want the first match, use regexp_match (PG 10+) instead — it returns a single text[] row or NULL.

✓ Instead: -- Avoid: regexp_matches without 'g' flag in a SELECT list SELECT id, (regexp_matches(col, 'pattern'))[1] FROM t; -- Prefer: regexp_match for single-match extraction SELECT id, (regexp_match(col, 'pattern'))[1] FROM t;

Use `(regexp_match(str, pattern))[n]` to extract the nth capture group. Index 1 is the first group. Wrap in COALESCE to handle non-matching rows.

example
SELECT COALESCE((regexp_match(url, 'https?://([^/]+)'))[1], 'unknown') AS domain FROM page_views;
Domain extracted from URL, 'unknown' if no match

regexp_matches

PG 8.3+setof text[]

Returns all matches of a regex pattern as a set of text arrays — one row per match. Use 'g' flag to find all non-overlapping matches.

DeveloperData Eng

Signature

regexp_matches ( string text, pattern text [, flags text] ) → setof text[]

Parameters

ParameterTypeDescription
stringtextInput string
patterntextPOSIX regular expression
flagstext'g' to return all matches (not just first)

Examples

sql
SELECT regexp_matches('one two three', '\w+', 'g');
{one}, {two}, {three} (3 rows)
sql
SELECT m[1] FROM regexp_matches(body, '<a href="([^"]+)"', 'g') AS t(m);
All href values in HTML
sql
SELECT id, m[1] AS tag FROM posts, regexp_matches(content, '#(\w+)', 'g') AS t(m);
One row per hashtag per post
sql
SELECT array_agg(m[1]) AS emails FROM regexp_matches(raw_text, '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', 'g') AS t(m);
Array of all email addresses found in text
Anti-PatternUsing regexp_matches without the 'g' flag for single-match extraction

Without the 'g' flag, regexp_matches still returns SETOF text[] but only yields the first match. This set-returning behavior inside a SELECT list can cause unexpected row multiplication. On PostgreSQL 10+, use regexp_match (no 's') for cleaner single-match semantics.

✓ Instead: -- Avoid on PG 10+: set-returning function in SELECT list SELECT (regexp_matches(col, 'pattern'))[1] FROM t; -- Prefer: SELECT (regexp_match(col, 'pattern'))[1] FROM t;

Wrap in a lateral join to get all matches per row: `FROM texts t, regexp_matches(t.content, pattern, 'g') AS m(match)`. This gives one result row per match per original row.

example
SELECT id, m[1] AS tag FROM posts, regexp_matches(content, '#(\w+)', 'g') AS t(m);
One row per hashtag per post

regexp_replace

PG 7.4+text

Replaces the first (or all, with 'g' flag) occurrences of a regex pattern with the replacement string.

DeveloperData EngSecurity

Signature

regexp_replace ( string text, pattern text, replacement text [, flags text] ) → text

Parameters

ParameterTypeDescription
stringtextInput string
patterntextPOSIX regex pattern to match
replacementtextReplacement string; \1, \2, etc. refer to capture groups
flagstextOptional: g=replace all, i=case-insensitive

Examples

sql
SELECT regexp_replace('Hello World', 'World', 'PostgreSQL');
Hello PostgreSQL
sql
SELECT regexp_replace('  hello  world  ', '\s+', ' ', 'g');
hello world
sql
SELECT regexp_replace(phone, '[^0-9]', '', 'g') FROM contacts;
Digits-only phone number
sql
SELECT regexp_replace(content, '<[^>]+>', '', 'g') FROM articles;
HTML tags stripped from content
Anti-PatternForgetting the 'g' flag when replacing all occurrences

Without the 'g' flag, regexp_replace only replaces the FIRST match. This is a common source of bugs when sanitizing strings (e.g., removing all special characters). Always add 'g' when you intend a global replace.

✓ Instead: -- Only removes the first non-digit character (bug): SELECT regexp_replace(phone, '[^0-9]', '', ''); -- Removes all non-digit characters (correct): SELECT regexp_replace(phone, '[^0-9]', '', 'g');

Reference capture groups in the replacement using `\1`, `\2`, etc. For example, to reformat dates from 'YYYY-MM-DD' to 'DD/MM/YYYY': `regexp_replace(d, '(\d{4})-(\d{2})-(\d{2})', '\3/\2/\1')`.

example
SELECT regexp_replace(full_name, '(\w+) (\w+)', '\2, \1') AS last_first FROM users;
"Smith, John" from "John Smith"

regexp_split_to_array

PG 8.3+text[]

Splits a string on matches of a regex pattern and returns the parts as an array.

DeveloperData Eng

Signature

regexp_split_to_array ( string text, pattern text [, flags text] ) → text[]

Parameters

ParameterTypeDescription
stringtextString to split
patterntextRegex pattern to split on
flagstextOptional regex flags

Examples

sql
SELECT regexp_split_to_array('one two  three', '\s+');
{one,two,three}

Treats multiple commas as one delimiter

sql
SELECT regexp_split_to_array('a,b,,c', ',+');
{a,b,c}
sql
SELECT regexp_split_to_array(csv_line, ',(?=(?:[^"]*"[^"]*")*[^"]*$)') FROM data;
Splits respecting quoted commas
sql
SELECT array_length(regexp_split_to_array(sentence, '\s+'), 1) AS word_count FROM documents;
Word count per document row
Anti-PatternUsing regex split when a fixed delimiter is all you need

regexp_split_to_array compiles a regex engine for every call. When the delimiter is a plain character (comma, pipe, tab), `string_to_array` is simpler and faster — there is no regex overhead.

✓ Instead: -- Unnecessary regex overhead for a fixed delimiter: SELECT regexp_split_to_array(col, ','); -- Prefer for fixed delimiters: SELECT string_to_array(col, ',');

Use `regexp_split_to_array(str, '\s+')` to split on one or more whitespace characters. This handles tabs, multiple spaces, and newlines — unlike `string_to_array(str, ' ')` which only splits on single space.

example
SELECT regexp_split_to_array(search_query, '\s+') AS tokens FROM searches;
Array of whitespace-separated search terms

regexp_split_to_table

PG 8.3+setof text

Splits a string on regex matches and returns each part as a separate row (set-returning function).

DeveloperData Eng

Signature

regexp_split_to_table ( string text, pattern text [, flags text] ) → setof text

Parameters

ParameterTypeDescription
stringtextString to split
patterntextRegex delimiter pattern
flagstextOptional regex flags

Examples

sql
SELECT regexp_split_to_table('one two three', '\s+');
one (row 1), two (row 2), three (row 3)
sql
SELECT id, regexp_split_to_table(tags_csv, ',') AS tag FROM posts;
One row per tag per post
sql
SELECT p.id, trim(t.tag) AS tag FROM posts p, regexp_split_to_table(p.tags_csv, '\s*,\s*') AS t(tag) WHERE trim(t.tag) <> '';
Trimmed, non-empty tag per post row
sql
SELECT count(*) AS token_count FROM regexp_split_to_table('the quick brown fox', '\s+') AS w;
4
Anti-PatternCalling regexp_split_to_table in a SELECT list without a lateral join

Placing a set-returning function directly in the SELECT list (not in FROM/LATERAL) was deprecated in PostgreSQL 10 and produces confusing results. Always put regexp_split_to_table in the FROM clause.

✓ Instead: -- Deprecated / confusing: SELECT id, regexp_split_to_table(tags_csv, ',') FROM posts; -- Correct lateral usage: SELECT p.id, t.tag FROM posts p, regexp_split_to_table(p.tags_csv, ',') AS t(tag);

Use with a lateral join to unnest delimited values per row while keeping the parent row's columns. Alternative to `unnest(string_to_array(...))`.

example
SELECT p.id, trim(t.tag) AS tag FROM posts p, regexp_split_to_table(p.tags_csv, ',') AS t(tag) WHERE trim(t.tag) != '';
One row per trimmed tag per post

regexp_like

PG 15+boolean

Returns true if the string matches the POSIX regular expression pattern.

DeveloperDBASecurity

Signature

regexp_like ( string text, pattern text [, flags text] ) → boolean

Parameters

ParameterTypeDescription
stringtextString to test
patterntextPOSIX regex pattern
flagstextOptional: i=case-insensitive

Examples

sql
SELECT regexp_like('Hello World', 'world', 'i');
true
sql
SELECT * FROM users WHERE regexp_like(email, '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
    
  

);
Users with valid email format
sql
SELECT * FROM orders WHERE NOT regexp_like(postal_code, '^[0-9]{5}(-[0-9]{4})?
    
  

);
Orders with invalid US postal codes
sql
SELECT regexp_like(input_value, '^[a-zA-Z0-9_-]{3,32}
    
  

) AS is_valid_username;
true if username contains only safe characters
Anti-PatternUsing regex for simple prefix/suffix/substring tests

regexp_like (and the ~ operator) are powerful but have regex compilation overhead. For simple starts-with, ends-with, or contains checks, LIKE or ILIKE is up to 10x faster and far more readable. Use regex only when the pattern genuinely requires it.

✓ Instead: -- Slow: regex for a simple prefix check SELECT * FROM t WHERE regexp_like(col, '^prefix'); -- Fast: LIKE SELECT * FROM t WHERE col LIKE 'prefix%';

`regexp_like(str, pat)` is equivalent to `str ~ pat` (or `str ~* pat` with 'i' flag). Use `regexp_like` in code for readability; use `~` for concise inline conditions.

example
SELECT * FROM products WHERE regexp_like(sku, '^[A-Z]{2}[0-9]{4}
    
  

);
Products with SKU matching the expected format

regexp_count

PG 15+integer

Counts the number of non-overlapping occurrences of a regex pattern in the string.

DeveloperDBASecurity

Signature

regexp_count ( string text, pattern text [, start integer [, flags text]] ) → integer

Parameters

ParameterTypeDescription
stringtextString to search
patterntextPOSIX regex pattern
startintegerStarting position in the string (1-based, default 1)
flagstextOptional regex flags

Examples

sql
SELECT regexp_count('the cat sat on the mat', 'at');
3
sql
SELECT regexp_count(content, '<[^>]+>') FROM html_docs;
Number of HTML tags
sql
SELECT regexp_count(password, '[A-Z]') >= 1 AS has_uppercase FROM users;
Password complexity check
sql
SELECT id, regexp_count(body, '\b\w+\b') AS word_count FROM documents ORDER BY word_count DESC LIMIT 10;
Top 10 longest documents by word count
Anti-PatternUsing regexp_count as a substitute for a boolean existence test

regexp_count > 0 is semantically correct but wastes cycles counting every occurrence when you only care whether any match exists. Use regexp_like or the ~ operator for a boolean test — they short-circuit on the first match.

✓ Instead: -- Inefficient: counts all matches just to check existence SELECT * FROM t WHERE regexp_count(col, 'pattern') > 0; -- Efficient: short-circuits on first match SELECT * FROM t WHERE col ~ 'pattern';

Use `regexp_count` to count character classes in a password: check digits, uppercase, lowercase, and special characters in one pass.

example
SELECT regexp_count(pw, '[A-Z]') >= 1 AND regexp_count(pw, '[0-9]') >= 2 AND regexp_count(pw, '[^A-Za-z0-9]') >= 1 AS is_complex FROM user_passwords;
true if password meets complexity rules

regexp_substr

PG 15+text

Returns the substring matching the nth occurrence of a regex pattern, optionally returning a specific capture group.

DeveloperData EngDBA

Signature

regexp_substr ( string text, pattern text [, start integer [, n integer [, flags text [, subexpr integer]]]] ) → text

Parameters

ParameterTypeDescription
stringtextInput string
patterntextPOSIX regex pattern
startintegerStart position (default 1)
nintegerWhich occurrence to return (default 1)
flagstextRegex flags
subexprintegerWhich capture group to return (0=whole match, 1=first group)

Examples

sql
SELECT regexp_substr('the cat sat on the mat', 'at');
at

Second numeric occurrence

sql
SELECT regexp_substr('abc123def456', '[0-9]+', 1, 2);
456
sql
SELECT regexp_substr(url, 'https?://([^/]+)', 1, 1, 'i', 1);
domain name from URL
sql
SELECT regexp_substr(log_line, 'ERROR [A-Z0-9]+', 1, 1, 'i') AS error_code FROM app_logs WHERE log_line ~ 'ERROR';
First error code from each log line
Anti-PatternUsing regexp_substr on pre-PG 15 databases

regexp_substr was introduced in PostgreSQL 15. On older versions this function does not exist. Use (regexp_match(str, pattern))[1] as a portable alternative for single-match extraction.

✓ Instead: -- PG 15+ only: SELECT regexp_substr(col, 'pattern'); -- Portable alternative (PG 10+): SELECT (regexp_match(col, 'pattern'))[1];

Use the n parameter to extract the 2nd, 3rd, etc. occurrence of a pattern — something that required a complex CTE or PL/pgSQL loop before PostgreSQL 15.

example
SELECT regexp_substr(error_log, 'ERROR: [^\n]+', 1, 2) AS second_error FROM logs;
Second ERROR line from log

~ / ~* / !~ / !~*

PG 7.4+boolean

POSIX regex match operators. ~ (case-sensitive match), ~* (case-insensitive), !~ (not match), !~* (not match, case-insensitive).

DeveloperDBAData EngSecurity

Signatures

string ~ pattern → boolean
string ~* pattern → boolean
string !~ pattern → boolean
string !~* pattern → boolean

Parameters

ParameterTypeDescription
stringtextString to test
patterntextPOSIX regular expression

Examples

sql
SELECT 'foobar' ~ 'foo';
true
sql
SELECT 'FooBar' ~* 'foo';
true (case-insensitive)
sql
SELECT * FROM users WHERE email ~ '@gmail\.com
    
  

;
Gmail users
sql
SELECT * FROM logs WHERE message !~ 'DEBUG|INFO';
Non-debug/info log lines
Anti-PatternUsing POSIX regex for large-table full-text searches instead of GIN/trgm indexes

A bare col ~ 'pattern' with no index causes a full sequential scan. For large tables with flexible pattern searches, a pg_trgm GIN index accelerates ~ queries by orders of magnitude. For structured full-text search, consider tsvector/tsquery instead.

✓ Instead: -- Sequential scan on large table (slow): SELECT * FROM articles WHERE body ~ 'postgres(ql)?'; -- Add trigram index first: CREATE INDEX ON articles USING GIN (body gin_trgm_ops); -- Now the same query uses the index.

Regex patterns without a leading literal prefix can't use a B-tree index. Create a GIN trigram index with `CREATE INDEX ON t USING GIN (col gin_trgm_ops)` to accelerate `col ~ pattern` queries.

example
CREATE INDEX ON products USING GIN (description gin_trgm_ops);
SELECT * FROM products WHERE description ~ 'wire(less)?';
Fast regex search with trigram index

SIMILAR TO

PG 7.4+boolean

SQL-standard pattern matching. Like LIKE but with additional regex metacharacters: | (or), * (0+), + (1+), ? (0 or 1), {m,n} (repetition), () (grouping), [] (character class).

DeveloperDBA

Signature

string SIMILAR TO pattern → boolean

Parameters

ParameterTypeDescription
stringtextString to test
patterntextSQL similar-to pattern (anchored at both ends implicitly)

Examples

sql
SELECT 'abc' SIMILAR TO 'a(b|c)+';
true
sql
SELECT 'abc123' SIMILAR TO '[a-z]+[0-9]+';
true
sql
SELECT phone SIMILAR TO '[0-9]{10}' FROM contacts;
10-digit phone numbers
sql
SELECT * FROM products WHERE sku SIMILAR TO '(AA|BB|CC)[0-9]{4}';
SKUs starting with AA, BB, or CC followed by 4 digits
Anti-PatternPreferring SIMILAR TO over POSIX regex or LIKE

SIMILAR TO is a historical SQL standard compromise — it has confusing implicit anchoring, limited features compared to POSIX regex, and is not faster than the ~ operator. PostgreSQL internally rewrites SIMILAR TO as a regex anyway. Use LIKE/ILIKE for simple patterns and ~ for anything requiring alternation or quantifiers.

✓ Instead: -- SIMILAR TO: confusing and no performance benefit SELECT * FROM t WHERE col SIMILAR TO '(foo|bar)%'; -- Prefer POSIX regex for alternation: SELECT * FROM t WHERE col ~ '^(foo|bar)'; -- Or LIKE for simple prefix: SELECT * FROM t WHERE col LIKE 'foo%' OR col LIKE 'bar%';

Unlike the ~ operator, `SIMILAR TO` implicitly anchors to the whole string (like `^... PostgreSQL Regular Expression Functions — Reference, Examples & Syntax | pgref.dev PostgreSQL Function Reference — Commands, Examples & Documentation in regex). `'abc' SIMILAR TO 'b'` is FALSE. Use POSIX regex (`~`) for partial matches.

example
-- Matches the whole string:
SELECT 'abc' SIMILAR TO 'a.*c';  -- true
SELECT 'xabc' SIMILAR TO 'a.*c'; -- false
Full-string matching semantics