🔤

PostgreSQL String Functions

Complete reference for PostgreSQL string functions covering text manipulation, pattern matching, encoding, and string formatting. Every function includes syntax, parameters, real-world SQL examples, anti-pattern warnings, and pro tips. Updated for PostgreSQL 16.

50 functions

What are PostgreSQL String Functions?

PostgreSQL string functions are built-in functions that operate on text data. They handle tasks including trimming whitespace, extracting substrings, splitting strings, encoding conversions, and regular expression matching. Common PostgreSQL string functions include concat(), substring(), regexp_replace(), trim(), format(), and split_part().

ascii

PG 8.0+integer

Returns the numeric Unicode code point of the first character of the string.

DeveloperDBA

Signature

ascii ( string text ) → integer

Parameters

ParameterTypeDescription
stringtextInput string; only the first character is evaluated

Examples

Uppercase A

sql
SELECT ascii('A');
65

Lowercase a

sql
SELECT ascii('a');
97

Unicode euro sign

sql
SELECT ascii('€');
8364

Only the first character is used

sql
SELECT ascii('Hello');
72

Manual lowercase-to-uppercase via code point math

sql
SELECT chr(ascii('a') - 32);
A
Anti-PatternLooping ascii() per character to validate a string

Calling ascii() in a loop or for each character to check whether a string is pure ASCII or contains only certain characters is slow and verbose. A single `regexp_replace` or `translate` call handles the whole string in one pass.

✓ Instead: Use `col ~ '[^\x20-\x7E]'` to detect non-ASCII characters, or `regexp_replace(col, '[^\x20-\x7E]', '', 'g')` to strip them — no loop needed.

Use `WHERE ascii(left(col, 1)) < 32` to find rows that start with invisible control characters — null bytes (0), tabs (9), carriage returns (13), escape codes — that silently break CSV exports, JSON serialization, or downstream string comparisons. Pair with `regexp_replace(col, '[\x00-\x1F]', '', 'g')` to strip them.

example
SELECT id, col FROM users WHERE ascii(left(col,1)) < 32;
Rows with invisible leading characters

bit_length

PG 8.0+integer

Returns the number of bits in the string (8 × number of bytes, regardless of encoding).

DeveloperDBA

Signatures

bit_length ( string text ) → integer
bit_length ( string bytea ) → integer

Parameters

ParameterTypeDescription
stringtext | byteaInput string or binary data

Examples

5 ASCII chars × 8 bits

sql
SELECT bit_length('hello');
40

Euro sign is 3 bytes in UTF-8

sql
SELECT bit_length('€');
24
sql
SELECT bit_length('');
0
sql
SELECT bit_length('A'::bytea);
8

Compute byte size of text columns

sql
SELECT bit_length(col) / 8 AS byte_count FROM messages LIMIT 3;
byte_count per row
Anti-PatternUsing bit_length to check byte limits on user-facing fields

Developers sometimes use `bit_length(col) / 8` to compute a byte count and enforce storage limits. This is needlessly indirect. `octet_length(col)` returns bytes directly and is the correct tool for enforcing byte-level constraints such as network protocol limits or column storage checks.

✓ Instead: Use `CHECK (octet_length(col) <= 1024)` instead of `CHECK (bit_length(col) / 8 <= 1024)`.

bit_length is rarely what you want in practice. For checking if a string exceeds a byte limit (e.g. for a VARCHAR(n) or network packet), use `octet_length(col) <= 1024`. bit_length just multiplies by 8 — the only time it's distinctly useful is when working with actual bit-string types (BIT, BIT VARYING).

btrim

PG 8.0+text

Removes the longest string of characters in the characters set from both ends of the string; defaults to removing spaces.

DeveloperDBAData Eng

Signatures

btrim ( string text ) → text
btrim ( string text, characters text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to trim
characterstextSet of characters to remove (default: space)

Examples

Default: trim spaces

sql
SELECT btrim('  hello  ');
hello

Trim specific character

sql
SELECT btrim('xxhelloxx', 'x');
hello
sql
SELECT btrim('###hello###', '#');
hello

Characters argument is a SET, not a string prefix/suffix

sql
SELECT btrim('xyxhelloyx', 'xy');
hello

All-whitespace returns empty string

sql
SELECT btrim('   ');
Anti-PatternTrimming in a WHERE clause on an unindexed column

Using `WHERE btrim(col) = 'value'` prevents index use — PostgreSQL cannot use a plain B-tree index on `col` when a function wraps it. Every row is scanned and trimmed at runtime.

✓ Instead: Create a functional index `CREATE INDEX ON t (btrim(col))` and query with `WHERE btrim(col) = 'value'`, or better yet store pre-trimmed values by applying btrim at write time via a CHECK constraint or trigger.

`btrim('abchello', 'abc')` removes any combination of a, b, and c from both ends — not just the literal prefix 'abc'. So `btrim('cabcXcab', 'abc')` returns 'X'. This surprises developers expecting prefix/suffix matching. For exact prefix/suffix stripping, use `regexp_replace(col, '^prefix|suffix

#x27;, '', 'g')` instead.

char_length

PG 8.0+integer

Returns the number of characters (code points) in the string, not the byte count.

Developer

Signatures

char_length ( string text ) → integer
character_length ( string text ) → integer

Parameters

ParameterTypeDescription
stringtextInput string

Examples

sql
SELECT char_length('hello');
5
sql
SELECT char_length('');
0

4 characters, even though 'é' is multi-byte in UTF-8

sql
SELECT char_length('café');
4

NULL input returns NULL — use COALESCE if you need 0

sql
SELECT char_length(NULL);
NULL

3 Unicode characters regardless of byte length

sql
SELECT char_length('日本語');
3
Anti-PatternUsing char_length to enforce byte-level storage or protocol limits

A VARCHAR(255) limit on a column is a character limit, but external systems such as Kafka, HTTP headers, or SMS gateways have byte limits. `char_length('emoji😀') = 6` but `octet_length('emoji😀') = 9`. Validating against char_length will let oversized payloads through when users include emoji or CJK characters.

✓ Instead: Use `octet_length(col) <= N` for byte-budget checks, and `char_length(col) <= N` only for display/UX character counts.

All three look the same for pure ASCII but diverge with multibyte characters. `char_length('café')` = 4 (character count). `length('café')` = 4 (same for text). `octet_length('café')` = 5 (UTF-8 bytes — 'é' is 2 bytes). Use `char_length` for UI display limits, `octet_length` for storage/network byte limits. `length(bytea_col)` gives byte count of binary data.

chr

PG 8.0+text

Returns the character with the given Unicode code point.

DeveloperDBA

Signature

chr ( code integer ) → text

Parameters

ParameterTypeDescription
codeintegerUnicode code point (0–1114111)

Examples

sql
SELECT chr(65);
A

Line feed — common in dynamic string building

sql
SELECT chr(10);
(newline character)
sql
SELECT chr(9);
(tab character)

Euro sign

sql
SELECT chr(8364);

Generate entire alphabet

sql
SELECT string_agg(chr(generate_series + 64), '') FROM generate_series(1,26);
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Anti-PatternEmbedding literal newlines or tabs directly in SQL string literals

Pasting literal newline or tab characters into a SQL string literal makes the query unreadable and fragile across editors and terminals that may silently strip or convert whitespace.

✓ Instead: Use `chr(10)` for LF, `chr(13)` for CR, and `chr(9)` for tab. For CRLF: `chr(13) || chr(10)`. This keeps SQL scripts readable and encoding-safe.

Uploaded text files often have Windows CRLF endings. Strip them in one shot: `regexp_replace(col, chr(13)||chr(10), chr(10), 'g')` converts CRLF → LF. Then `regexp_replace(result, chr(13), chr(10), 'g')` catches any leftover bare CRs. Far more readable than embedding literal escape sequences in your SQL.

concat

PG 8.0+text

Concatenates all arguments into a single string, silently ignoring NULL values.

Developer

Signature

concat ( val1 "any" [, val2 "any" [, ...]] ) → text

Parameters

ParameterTypeDescription
val1, val2, ...anyValues to concatenate; NULLs are skipped

Examples

sql
SELECT concat('Hello', ', ', 'World');
Hello, World

NULL is silently skipped — unlike ||

sql
SELECT concat('foo', NULL, 'bar');
foobar
sql
SELECT concat(first_name, ' ', last_name) FROM users LIMIT 1;
Jane Doe

Non-text types are cast automatically

sql
SELECT concat(42, ' items at 
    
  

, 3.99);
42 items at $3.99

All NULLs returns empty string, not NULL

sql
SELECT concat(NULL, NULL);
Anti-PatternUsing || with nullable columns expecting the same NULL-ignoring behavior as concat()

A very common bug: `first_name || ' ' || last_name` returns NULL whenever either column is NULL. Developers who expect this to 'just work' like concat() get NULL full names for users with no last name on record.

✓ Instead: Use `concat(first_name, ' ', last_name)` or `concat_ws(' ', first_name, last_name)` when any input may be NULL.

The `||` operator propagates NULL: `'foo' || NULL || 'bar'` returns NULL. `concat()` skips NULLs: `concat('foo', NULL, 'bar')` returns 'foobar'. This distinction is critical when building full names, addresses, or dynamic labels from nullable columns. Use `concat` for user-facing string building; use `||` when NULL-propagation is intentional (e.g. to detect missing required values).

concat_ws

PG 9.1+text

Concatenates all non-NULL arguments separated by the given separator string.

DeveloperData Eng

Signature

concat_ws ( sep text, val1 "any" [, val2 "any" [, ...]] ) → text

Parameters

ParameterTypeDescription
septextSeparator string inserted between non-NULL values
val1, val2, ...anyValues to join; NULLs are skipped entirely

Examples

sql
SELECT concat_ws(', ', 'one', 'two', 'three');
one, two, three

No double space when middle name is NULL

sql
SELECT concat_ws(' ', first_name, middle_name, last_name) FROM people;
John Smith (when middle_name is NULL)
sql
SELECT concat_ws('-', '2024', '03', '15');
2024-03-15

NULLs suppressed, no leading/trailing separator

sql
SELECT concat_ws(', ', NULL, 'a', NULL, 'b');
a, b

Empty strings ARE included unlike NULLs

sql
SELECT concat_ws('/', 'https:', '', 'example.com', 'path');
https:///example.com/path
Anti-PatternNot realizing empty strings are included but NULLs are not

`concat_ws(', ', 'a', '', 'c')` returns `'a, , c'` — the empty string produces a dangling separator, which surprises developers who expect empty strings to be skipped like NULLs.

✓ Instead: Use `NULLIF(col, '')` to coerce empty strings to NULL before passing to concat_ws: `concat_ws(', ', NULLIF(line2,''), NULLIF(city,''))`.

`concat_ws(', ', line1, line2, city, state, zip)` automatically skips any NULL component and avoids double commas. This replaces 5-10 lines of CASE/COALESCE logic when composing optional address fields. Note: empty strings ('') are NOT skipped — only NULLs are. Use NULLIF(col, '') to coerce empty strings to NULL first if needed.

convert_from

PG 8.0+text

Converts a byte string from the specified source encoding to the database's encoding (typically UTF-8).

DBAData Eng

Signature

convert_from ( string bytea, src_encoding name ) → text

Parameters

ParameterTypeDescription
stringbyteaRaw bytes in the source encoding
src_encodingnameSource encoding name (e.g. 'LATIN1', 'WIN1252', 'UTF8')

Examples

sql
SELECT convert_from('\xc3a9'::bytea, 'UTF8');
é
sql
SELECT convert_from(col_bytes, 'WIN1252') FROM imports WHERE encoding = 'windows';
Decoded text string

ASCII is the same in all encodings

sql
SELECT convert_from('hello'::bytea, 'UTF8');
hello
sql
SELECT octet_length(convert_from(raw_col, 'LATIN1')) FROM data;
Byte counts after conversion
sql
SELECT convert_from(decode(base64_col, 'base64'), 'UTF8') FROM uploads;
Decoded and converted text
Anti-PatternCasting legacy bytea columns directly to text without specifying encoding

Writing `col::text` on a bytea column that holds Latin-1 or Windows-1252 bytes silently misinterprets the data — PostgreSQL assumes UTF-8. Characters above U+007F will either corrupt or raise a conversion error.

✓ Instead: Always use `convert_from(col, 'WIN1252')` (or the correct source encoding) so PostgreSQL performs a proper encoding conversion to the database's UTF-8 encoding.

When importing legacy data stored as Windows-1252 or Latin-1, convert at query time: `SELECT convert_from(col::bytea, 'WIN1252')`. If the source is already stored as text but was mis-declared as UTF-8, use `SELECT convert_from(col::bytea, 'WIN1252')` — but first check `SHOW server_encoding` to confirm your DB is UTF8. For bulk fixes, combine with an UPDATE.

decode

PG 8.0+bytea

Decodes binary data from a text representation using the specified format (base64, hex, or escape).

DeveloperDBASecurity

Signature

decode ( string text, format text ) → bytea

Parameters

ParameterTypeDescription
stringtextText representation of binary data
formattext'base64', 'hex', or 'escape'

Examples

Decodes 'Hello' from base64

sql
SELECT decode('SGVsbG8=', 'base64');
\x48656c6c6f
sql
SELECT decode('48656c6c6f', 'hex');
\x48656c6c6f

Decode base64 directly to text

sql
SELECT convert_from(decode('SGVsbG8=', 'base64'), 'UTF8');
Hello
sql
SELECT length(decode(thumbnail_b64, 'base64')) AS bytes FROM images;
Byte size of decoded images

Round-trip encode/decode

sql
SELECT decode(encode('secret'::bytea, 'hex'), 'hex') = 'secret'::bytea;
true
Anti-PatternStoring base64-encoded binary in text columns and decoding at query time for every read

Storing images or file content as base64 text and calling decode() on every read adds CPU overhead for decoding and bloats storage by ~33% compared to native bytea. It also makes byte-level operations (like checking file magic bytes) require an extra decode step.

✓ Instead: Store binary data in a `bytea` column directly. Use `decode()` only as a one-time migration step to convert existing base64 text into bytea.

`SELECT encode(col, 'hex') FROM table` gives you a readable hex dump of any bytea column — useful for debugging binary protocols, checking magic bytes, or validating file headers. For images: `left(encode(img_data, 'hex'), 8)` — the first 4 bytes should be `89504e47` for PNG or `ffd8ffe0` for JPEG.

encode

PG 8.0+text

Encodes binary data into a text representation using the specified format (base64, hex, or escape).

DeveloperDBASecurity

Signature

encode ( data bytea, format text ) → text

Parameters

ParameterTypeDescription
databyteaBinary data to encode
formattext'base64', 'hex', or 'escape'

Examples

Standard base64 encoding

sql
SELECT encode('Hello'::bytea, 'base64');
SGVsbG8=
sql
SELECT encode('Hello'::bytea, 'hex');
48656c6c6f

Generate a random hex token

sql
SELECT encode(gen_random_bytes(16), 'hex');
a3f8...random 32-char hex

pgcrypto: hex-encoded hash

sql
SELECT encode(digest('secret', 'sha256'), 'hex');
2bb80d537...sha256 hex
sql
SELECT encode(logo_bytes, 'base64') FROM products WHERE id = 1;
Base64 image data for embedding in JSON/HTML
Anti-PatternUsing encode(col, 'base64') output directly in URLs without stripping newlines

PostgreSQL's base64 output inserts newline characters every 76 characters per the MIME standard. Embedding this directly in a URL or JSON field produces broken tokens.

✓ Instead: Strip newlines after encoding: `replace(encode(data, 'base64'), chr(10), '')`. For URL-safe base64, also translate: `translate(replace(encode(data,'base64'), chr(10),''), '+/=', '-_')`.

`SELECT encode(gen_random_bytes(18), 'base64')` generates a 24-character base64 token. For URL-safe tokens, replace the `+` and `/` characters: `translate(encode(gen_random_bytes(18), 'base64'), '+/=', '-_')`. This is a perfectly adequate session token or API key generator that avoids application-layer randomness — useful for bulk-generating invite codes with a single INSERT...SELECT.

format

PG 9.1+text

Formats a string using printf-style format specifiers with safe SQL identifier and literal quoting.

DeveloperDBAData Eng

Signature

format ( formatstr text [, formatarg "any" [, ...]] ) → text

Parameters

ParameterTypeDescription
formatstrtextFormat string with %s (value), %I (quoted identifier), %L (quoted literal), %% (literal %)
arg1, ...anyValues substituted for format specifiers in order, or by position (%1$s)

Examples

sql
SELECT format('Hello, %s!', 'world');
Hello, world!

%I quotes identifiers, %L quotes literals safely

sql
SELECT format('SELECT * FROM %I WHERE id = %L', 'my_table', 42);
SELECT * FROM my_table WHERE id = '42'

Positional arguments — reuse arg 1 twice

sql
SELECT format('%1$s=%1$s', 'foo');
foo=foo

Width and alignment (like printf)

sql
SELECT format('%-10s|%10s', 'left', 'right');
left | right

Use inside plpgsql for safe dynamic queries

sql
EXECUTE format('UPDATE %I SET %I = %L WHERE id = %s', tbl, col, val, id);
Safe dynamic SQL in PL/pgSQL
Anti-PatternBuilding dynamic SQL with string concatenation instead of format()

Writing `'SELECT * FROM ' || table_name || ' WHERE id = ' || user_id` in a PL/pgSQL function is a SQL injection vector when `table_name` or `user_id` come from user input or external data. It also becomes unreadable with many substitutions.

✓ Instead: Use `format('SELECT * FROM %I WHERE id = %L', table_name, user_id)`. The `%I` specifier safely quotes identifiers; `%L` safely quotes literals.

Never concatenate user-supplied table or column names into dynamic SQL with `||`. Instead: `EXECUTE format('SELECT %I FROM %I WHERE %I = %L', col_name, table_name, filter_col, filter_val)`. `%I` double-quotes identifiers and escapes embedded double-quotes. `%L` single-quotes literals and escapes embedded single-quotes. Together they make injection impossible — even with adversarial input like `'; DROP TABLE users; --`.

initcap

PG 8.0+text

Converts the first letter of each word to upper case and the rest to lower case; words are delimited by non-alphanumeric characters.

Developer

Signature

initcap ( string text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to title-case

Examples

sql
SELECT initcap('hello world');
Hello World

Lowercases the non-initial letters too

sql
SELECT initcap('HELLO WORLD');
Hello World

Capitalizes after apostrophe — a known quirk

sql
SELECT initcap('it''s a test');
It'S A Test

Correct result for this case

sql
SELECT initcap('o''reilly');
O'Reilly

Hyphen acts as word separator

sql
SELECT initcap('user-defined type');
User-Defined Type
Anti-PatternUsing initcap on names with Irish/Scottish prefixes expecting correct output

`initcap('mcdonald')` returns `Mcdonald` (not `McDonald`), and `initcap('macleod')` returns `Macleod` (not `MacLeod`). Using initcap as a general name-normalization function in a customer database will silently produce incorrect titles for a meaningful fraction of names.

✓ Instead: Use initcap only for generic title-casing (product names, city names). For personal names, implement a custom normalization function or handle it in application code where you can manage special-case rules.

`initcap('mcdonald')` returns `Mcdonald`, not `McDonald`. `initcap('o\'reilly')` returns `O'Reilly` (correct), but `initcap('it\'s fine')` returns `It'S Fine` (wrong — S is capitalized after the apostrophe). For handling edge cases like Mc/Mac prefixes or names with apostrophes, you need a custom PL/pgSQL function. For most first/last name standardization, initcap is good enough — just document the exception.

left

PG 8.0+text

Returns the first n characters of the string; if n is negative, returns all but the last |n| characters.

DeveloperData Eng

Signature

left ( string text, n integer ) → text

Parameters

ParameterTypeDescription
stringtextInput string
nintegerNumber of characters to return from the left; negative means remove from the right

Examples

sql
SELECT left('Hello, World!', 5);
Hello

Negative: removes last 2 characters

sql
SELECT left('abcdef', -2);
abcd
sql
SELECT left(email, strpos(email, '@') - 1) FROM users;
username part of email
sql
SELECT left(description, 100) || '...' FROM articles WHERE length(description) > 100;
Truncated description with ellipsis

n larger than string length — returns the whole string

sql
SELECT left('abc', 100);
abc
Anti-PatternUsing left() in a WHERE clause to test for a prefix instead of LIKE or starts_with()

`WHERE left(col, 5) = 'ACME-'` wraps the column in a function call, preventing B-tree index use on `col`. The query does a full sequential scan.

✓ Instead: Use `WHERE col LIKE 'ACME-%'` or `WHERE starts_with(col, 'ACME-')` — both can use a standard B-tree index on the column.

To replace a known-length prefix: `right(col, -6)` strips the first 6 chars. To strip a known-length suffix: `left(col, -4)` removes the last 4. Combine them to extract the middle of a fixed-format string without computing positions: `left(right('PREFIX_middle_SUFFIX', -7), -7)`. Faster and more readable than `substring(col, 8, length(col)-14)` for fixed-format codes.

length

PG 8.0+integer

Returns the number of characters in the string, or the number of bytes in a bytea value.

DeveloperDBA

Signatures

length ( string text ) → integer
length ( string bytea ) → integer

Parameters

ParameterTypeDescription
stringtext | byteaInput string or binary data

Examples

sql
SELECT length('hello');
5
sql
SELECT length(NULL);
NULL
sql
SELECT length('');
0

Character count, not byte count

sql
SELECT length('日本語');
3

Data validation query

sql
SELECT id FROM products WHERE length(sku) != 8;
Rows where SKU is not exactly 8 characters
Anti-PatternUsing length() > 0 to filter out empty and NULL values

`WHERE length(col) > 0` silently excludes NULL rows without error — they just disappear from results. Developers often write this expecting it to mean 'has any value' but NULLs are not matched.

✓ Instead: Write `WHERE col IS NOT NULL AND col <> ''` to explicitly handle both empty strings and NULLs, making the intent clear.

`WHERE length(col) > 0` will NOT match NULL rows — they are silently excluded. If you want 'non-empty, non-null', write `WHERE col IS NOT NULL AND col <> ''`, or `WHERE length(col) > 0 OR col IS NULL` to include NULLs. For a zero-safe version: `COALESCE(length(col), 0) > 0`. This is the most common NULL-related bug in string validation queries.

lower

PG 8.0+text

Converts all characters in the string to their lower-case equivalents using the current locale.

DeveloperDBA

Signature

lower ( string text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to convert to lower case

Examples

sql
SELECT lower('HELLO WORLD');
hello world
sql
SELECT lower('PostgreSQL 16');
postgresql 16
sql
SELECT lower(NULL);
NULL
sql
SELECT * FROM users WHERE lower(email) = lower($1);
Case-insensitive email lookup

Unicode-aware with proper locale

sql
SELECT lower('ÜNÏCÖDÉ');
ünïcödé
Anti-PatternCalling lower() in a WHERE clause on a column without a functional index

`WHERE lower(email) = 'user@example.com'` performs a full table scan on large tables — PostgreSQL cannot use a plain B-tree index on `email` when the column is wrapped in lower().

✓ Instead: Create a functional index: `CREATE INDEX ON users (lower(email))` and keep the WHERE clause as `WHERE lower(email) = $1`. Alternatively, use the `citext` extension for transparent case-insensitive comparisons without function calls.

`CREATE INDEX idx_users_email_lower ON users (lower(email));` then query with `WHERE lower(email) = lower($1)`. This uses the index efficiently — no full table scan. Much faster than `ILIKE` for equality checks on large tables, and avoids storing a separate normalized column. Works with unique constraints too: `CREATE UNIQUE INDEX ON users (lower(email))` prevents duplicate emails regardless of case.

lpad

PG 8.0+text

Left-pads the string to the specified length by prepending copies of the fill string (default: space).

DeveloperData Eng

Signatures

lpad ( string text, length integer ) → text
lpad ( string text, length integer, fill text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to pad
lengthintegerTarget total length; if string is longer, it is truncated
filltextPadding string (default: single space)

Examples

sql
SELECT lpad('42', 8, '0');
00000042

Default fill is space

sql
SELECT lpad('hello', 10);
hello

Truncates if string exceeds length

sql
SELECT lpad('toolongstring', 5);
toolon
sql
SELECT lpad(invoice_number::text, 10, '0') FROM invoices;
0000000042 — zero-padded display IDs

Empty string padded to fill width

sql
SELECT lpad('', 3, '*');
***
Anti-PatternUsing lpad to zero-pad then sorting the result lexicographically without consistent width

If different rows are padded to different widths (e.g. some to 6 digits, others to 8), lexicographic ORDER BY still produces wrong results. lpad only fixes sort order when ALL values are padded to the same target length.

✓ Instead: Always use a fixed, consistent length for lpad: `lpad(id::text, 10, '0')` for all rows. If you store the padded value, add a CHECK constraint ensuring the length is always exactly N.

If IDs or codes will ever be sorted lexicographically (file exports, S3 keys, report codes), always zero-pad: `lpad(id::text, 10, '0')`. Without padding, '2' sorts after '10' in string order. Zero-padding makes numeric and lexicographic sort identical. This also matters for binary search in sorted files — a 10-digit zero-padded integer is always exactly 10 bytes, making fixed-width record parsing trivial.

ltrim

PG 8.0+text

Removes the longest string of characters in the characters set from the beginning (left side) of the string.

DeveloperDBAData Eng

Signatures

ltrim ( string text ) → text
ltrim ( string text, characters text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to left-trim
characterstextCharacter set to remove (default: space)

Examples

sql
SELECT ltrim('   hello');
hello

Strip leading zeros from string numbers

sql
SELECT ltrim('000042', '0');
42
sql
SELECT ltrim('xxxhello', 'x');
hello

Strip leading slash

sql
SELECT ltrim('/path/to/file', '/');
path/to/file
sql
SELECT ltrim('--comment', '-');
comment
Anti-PatternTrimming leading zeros with ltrim then casting without guarding the all-zero case

`ltrim('000', '0')::integer` throws an error because ltrim returns an empty string, and casting '' to integer fails with `invalid input syntax for type integer`.

✓ Instead: Use `NULLIF(ltrim(col, '0'), '')::integer` — the NULLIF converts empty string to NULL, and NULL::integer is NULL rather than an error.

When string numbers arrive zero-padded and you need the integer value, `ltrim(col, '0')::integer` is faster and cleaner than regex. Watch out: `ltrim('0', '0')` returns '' (empty string), so cast safely with `NULLIF(ltrim(col, '0'), '')::integer` to avoid a cast error on all-zero strings.

md5

PG 8.0+text

Computes the MD5 hash of the input and returns it as a 32-character lowercase hexadecimal string.

DeveloperDBA

Signatures

md5 ( string text ) → text
md5 ( data bytea ) → text

Parameters

ParameterTypeDescription
stringtext | byteaInput value to hash

Examples

sql
SELECT md5('hello');
5d41402abc4b2a76b9719d911017c592

MD5 of empty string

sql
SELECT md5('');
d41d8cd98f00b204e9800998ecf8427e
sql
SELECT md5(row_to_json(t)::text) FROM users t WHERE id = 1;
Fingerprint of entire row as hex string

Content-change detection

sql
SELECT id FROM cache_entries WHERE md5(content) != stored_hash;
Rows where content has changed since last hash
sql
SELECT '\x' || md5(random()::text) AS random_hex;
Random 32-char hex string
Anti-PatternUsing md5() to hash passwords before storing them

MD5 is cryptographically broken. Using `md5(password)` or even `md5(salt || password)` for password storage is dangerously insecure — rainbow tables and GPU cracking render it useless. This is a well-known but still common mistake.

✓ Instead: Use the `pgcrypto` extension: `crypt(password, gen_salt('bf', 12))` for bcrypt hashing, and `crypt(input, stored_hash)` for verification. Never store MD5-hashed passwords.

Never use `md5` for password hashing (broken, unsalted). Use it for: deduplication keys (`SELECT DISTINCT ON (md5(content))`), cache invalidation tokens, row-level change detection (`WHERE md5(row_to_json(t)::text) != last_hash`), and generating deterministic fake IDs for test data. For cryptographic hashing, use pgcrypto's `digest(col, 'sha256')`.

octet_length

PG 8.0+integer

Returns the number of bytes in the string (UTF-8 multibyte characters count as multiple bytes).

DeveloperDBAData Eng

Signatures

octet_length ( string text ) → integer
octet_length ( string bytea ) → integer

Parameters

ParameterTypeDescription
stringtext | byteaInput string or binary data

Examples

ASCII: byte count = char count

sql
SELECT octet_length('hello');
5

'é' is 2 bytes in UTF-8, so 4 chars = 5 bytes

sql
SELECT octet_length('café');
5

3 chars × 3 bytes each in UTF-8

sql
SELECT octet_length('日本語');
9
sql
SELECT id FROM messages WHERE octet_length(body) > 65535;
Rows exceeding 64KB payload limit
sql
SELECT octet_length('\x0102ff'::bytea);
3
Anti-PatternEnforcing VARCHAR(n) byte limits with char_length instead of octet_length

PostgreSQL's `VARCHAR(n)` limit is in characters, not bytes. But external systems (Kafka, S3 metadata, SNS) often have byte limits. A 255-character field containing 255 emoji is 1,020 bytes — far over a 1KB limit. Checking `char_length(col) <= 255` does not catch this.

✓ Instead: Add a separate CHECK constraint: `CHECK (octet_length(col) <= 1024)` when the destination system has a byte budget.

Protocol limits (HTTP headers, Kafka messages, email subjects) are in bytes, not characters. A 255-character VARCHAR that looks fine can exceed limits when it contains emoji or CJK characters. Add a CHECK constraint: `CHECK (octet_length(col) <= 1024)` to enforce byte limits. Pair with a UI warning: show remaining chars but calculate against `maxBytes / avgBytesPerChar` heuristically.

overlay

PG 8.0+text

Replaces a substring starting at position start for count characters with the replacement string.

DeveloperDBASecurity

Signature

overlay ( string text PLACING replacement text FROM start integer [ FOR count integer ] ) → text

Parameters

ParameterTypeDescription
stringtextOriginal string
replacementtextString to insert
startinteger1-based character position where replacement begins
countintegerNumber of characters to replace (defaults to length of replacement)

Examples

sql
SELECT overlay('Txxxxas' PLACING 'hom' FROM 2 FOR 4);
Thomas
sql
SELECT overlay('Hello World' PLACING '-----' FROM 7 FOR 5);
Hello -----

Mask first 14 chars of credit card

sql
SELECT overlay(card_num PLACING '****-****-****' FROM 1 FOR 14) FROM cards;
****-****-****-1234
sql
SELECT overlay(phone PLACING 'XXX' FROM 5 FOR 3) FROM contacts;
555-XXX-1234

Without FOR: replaces length-of-replacement chars

sql
SELECT overlay('abcdefgh' PLACING 'XYZ' FROM 3);
abXYZfgh
Anti-PatternUsing overlay without FOR when the replacement length differs from what you want to remove

When FOR is omitted, overlay replaces exactly `length(replacement)` characters. If your replacement is shorter or longer than the section you want to overwrite, you get unexpected results — characters are shifted rather than replaced in-place.

✓ Instead: Always specify FOR explicitly: `overlay(col PLACING new_val FROM start FOR length_to_remove)`. Count the characters you want to overwrite, not the length of the replacement.

Use `overlay` to partially mask sensitive data for display or logging: `overlay(email PLACING repeat('*', strpos(email,'@')-2) FROM 2 FOR strpos(email,'@')-2)` turns 'user@example.com' into 'u***@example.com'. Unlike `regexp_replace`, `overlay` preserves the original string boundaries and positions, making partial redaction exact and predictable.

position

PG 8.0+integer

Returns the starting position (1-based) of the first occurrence of substring within string, or 0 if not found.

Developer

Signatures

position ( substring text IN string text ) → integer
strpos ( string text, substring text ) → integer

Parameters

ParameterTypeDescription
substringtextString to search for
stringtextString to search within

Examples

sql
SELECT position('om' IN 'Thomas');
3

strpos() is identical — different syntax

sql
SELECT strpos('Thomas', 'om');
3

Returns 0 (not NULL or -1) when not found

sql
SELECT position('xyz' IN 'hello');
0
sql
SELECT position('@' IN email) FROM users;
Position of @ in each email address
sql
SELECT substr(url, position('://' IN url) + 3) FROM links;
URL without the protocol prefix
Anti-PatternUsing strpos() > -1 to test for substring presence (copied from other languages)

In Python and JavaScript, `str.indexOf(sub) > -1` or `str.find(sub) != -1` tests for presence. In PostgreSQL, `strpos` returns 0 (not -1) for no match — so `strpos(col, 'foo') > -1` is always true, matching every row including those without the substring.

✓ Instead: Use `strpos(col, 'foo') > 0` or simply `col LIKE '%foo%'` to test for substring presence.

`WHERE strpos(col, 'foo') > 0` correctly finds matches. `WHERE strpos(col, 'foo') != 0` also works. The trap: developers from Python/JS write `> -1` expecting false for 'not found', but the minimum return value is 0 — so `> -1` matches EVERYTHING. Second trap: `position()` is case-sensitive — for case-insensitive search, use `strpos(lower(col), lower(needle)) > 0` or `ILIKE`.

quote_ident

PG 8.0+text

Quotes the given string as a SQL identifier, adding double quotes if necessary and doubling any embedded double quotes.

DeveloperDBASecurity

Signature

quote_ident ( string text ) → text

Parameters

ParameterTypeDescription
stringtextIdentifier (table name, column name, etc.) to safely quote

Examples

Simple identifiers need no quoting

sql
SELECT quote_ident('users');
users

Spaces require double-quoting

sql
SELECT quote_ident('My Table');
"My Table"

Reserved words are quoted

sql
SELECT quote_ident('select');
"select"

Embedded quotes are doubled

sql
SELECT quote_ident('table"name');
"table""name"
sql
EXECUTE 'SELECT * FROM ' || quote_ident(table_name);
Safe dynamic SQL using the identifier
Anti-PatternConcatenating table or column names directly into dynamic SQL strings

Writing `'SELECT * FROM ' || schema_name || '.' || table_name` in a stored procedure is a SQL injection risk. If either value contains special characters or reserved words, the query will either fail or — with malicious input — execute unintended SQL.

✓ Instead: Use `format('SELECT * FROM %I.%I', schema_name, table_name)` which calls quote_ident internally and handles all edge cases safely.

If a table name comes from a parameter or system catalog, `'SELECT * FROM ' || table_name` is a SQL injection vector. `'SELECT * FROM ' || quote_ident(table_name)` is not. In modern PL/pgSQL, prefer `format('SELECT * FROM %I', table_name)` — it's more readable and handles multiple identifiers cleanly. The `%I` specifier in `format()` calls `quote_ident()` internally.

quote_literal

PG 8.0+text

Quotes a string as a SQL string literal, adding single quotes and doubling any embedded single quotes; returns NULL for NULL input.

DeveloperDBASecurity

Signatures

quote_literal ( string text ) → text
quote_literal ( anyelement ) → text

Parameters

ParameterTypeDescription
stringtext | anyValue to safely quote as a SQL literal

Examples

Embedded single quotes are doubled

sql
SELECT quote_literal('O''Reilly');
'O''Reilly'

Non-text values are cast to text first

sql
SELECT quote_literal(42);
'42'

NULL input returns NULL (use quote_nullable for NULL-safe version)

sql
SELECT quote_literal(NULL);
NULL
sql
SELECT 'WHERE name = ' || quote_literal(user_input);
WHERE name = 'user input safely quoted'
sql
SELECT quote_literal(current_timestamp);
'2024-03-15 14:30:00+00'
Anti-PatternUsing quote_literal when the value might be NULL, breaking dynamic SQL silently

`'SET col = ' || quote_literal(maybe_null)` produces NULL when `maybe_null` is NULL — and `EXECUTE NULL` raises an error or does nothing. This is a silent failure that only appears at runtime when a nullable column is updated.

✓ Instead: Use `quote_nullable(maybe_null)` which returns the unquoted string `NULL` for NULL input, producing valid SQL like `SET col = NULL`.

In dynamic SQL, if a value might be NULL, `quote_literal(NULL)` returns NULL — and concatenating NULL into a string produces NULL, silently breaking your query. Use `quote_nullable(val)` instead: it returns the string 'NULL' for NULL input and a quoted literal otherwise. This is the safer default for all dynamic SQL value quoting in stored procedures.

quote_nullable

PG 8.0+text

Like quote_literal but returns the string 'NULL' (unquoted) when the input is NULL, making it safe for dynamic SQL with nullable values.

DeveloperDBASecurity

Signatures

quote_nullable ( string text ) → text
quote_nullable ( anyelement ) → text

Parameters

ParameterTypeDescription
valuetext | anyValue to quote; NULLs produce unquoted NULL

Examples

Returns unquoted NULL string — safe for dynamic SQL

sql
SELECT quote_nullable(NULL);
NULL
sql
SELECT quote_nullable('hello');
'hello'

Single quotes doubled

sql
SELECT quote_nullable('it''s');
'it''s'
sql
SELECT quote_nullable(42);
'42'
sql
EXECUTE 'UPDATE t SET col = ' || quote_nullable(maybe_null_val);
Safe even when value is NULL
Anti-PatternUsing quote_literal as the default for all value quoting in stored procedures

Many developers default to `quote_literal` for all value substitution in dynamic SQL because it's better known. But any nullable column will break the generated SQL when the value is NULL — `quote_literal(NULL)` propagates NULL into the string concatenation.

✓ Instead: Make `quote_nullable` your default for all value quoting in PL/pgSQL. Reserve `quote_literal` for cases where a NULL value is genuinely an error condition.

The difference matters most in UPDATE/INSERT statements built dynamically where optional columns might be NULL. With `quote_literal(NULL)` the SQL string itself becomes NULL and the EXECUTE fails silently or raises an error. With `quote_nullable(NULL)` you get `SET col = NULL` — the correct SQL. Make `quote_nullable` your default; only use `quote_literal` when a NULL should be an error.

regexp_count

PG 15+integer

Counts the number of non-overlapping matches of the POSIX regular expression pattern in the string.

DeveloperData Eng

Signature

regexp_count ( string text, pattern text [, start integer [, flags text ]] ) → integer

Parameters

ParameterTypeDescription
stringtextInput string to search
patterntextPOSIX regex pattern
startintegerCharacter position to start search (1-based, default 1)
flagstextRegex flags: 'i' case-insensitive, 'n' newline-sensitive, etc.

Examples

sql
SELECT regexp_count('hello world', 'o');
2
sql
SELECT regexp_count('aababab', 'ab');
3

Count uppercase letters

sql
SELECT regexp_count('Hello World', '[A-Z]', 1, 'g');
2
sql
SELECT regexp_count(body, '\m\w+@\w+\.\w+\M') FROM emails;
Number of email addresses in body
sql
SELECT id FROM logs WHERE regexp_count(message, 'ERROR', 1, 'i') > 3;
Log entries with more than 3 error mentions
Anti-PatternUsing regexp_count on PG versions before 15 without a version guard

`regexp_count` was added in PostgreSQL 15. Calling it on PG 14 or earlier raises `ERROR: function regexp_count(text, text) does not exist`. Applications that need to support multiple PG versions will break at runtime.

✓ Instead: Check your minimum supported PostgreSQL version. For pre-PG15, use the workaround: `array_length(regexp_split_to_array(col, pattern), 1) - 1` or a custom PL/pgSQL helper.

Before PG15: `array_length(regexp_split_to_array(text, 'pattern'), 1) - 1` gives the match count by counting splits. On PG15+, `regexp_count` is cleaner and faster. For counting word occurrences case-insensitively: `regexp_count(lower(col), lower(keyword))` — avoid the 'i' flag for simple literal matching as it disables certain optimizations.

regexp_like

PG 15+boolean

Returns true if the string matches the POSIX regular expression pattern.

DeveloperData Eng

Signature

regexp_like ( string text, pattern text [, flags text ] ) → boolean

Parameters

ParameterTypeDescription
stringtextInput string to test
patterntextPOSIX regex pattern
flagstextOptional flags: 'i' case-insensitive, 'n' newline-sensitive

Examples

sql
SELECT regexp_like('hello world', 'world');
true

Case-insensitive match

sql
SELECT regexp_like('Hello', 'hello', 'i');
true

Basic email validation

sql
SELECT regexp_like('test@example.com', '^[^@]+@[^@]+\.[^@]+
    
  

);
true
sql
SELECT regexp_like('abc123', '^[a-z]+
    
  

);
false
sql
SELECT * FROM users WHERE regexp_like(phone, '^[+]?[0-9\s\-().]{7,20}
    
  

);
Valid phone numbers
Anti-PatternWriting complex validation logic with regexp_like when a simpler constraint exists

Using `WHERE regexp_like(email, '^[^@]+@[^@]+\.[a-z]{2,}

#x27;)` for email validation in application queries is fine for filtering, but putting this in a CHECK constraint on a user-facing column leads to maintenance problems — regex email validation is notoriously hard to keep correct and RFC-compliant.

✓ Instead: For email columns, use a permissive CHECK like `CHECK (email LIKE '%@%.%')` combined with application-layer validation, or use the `pg_trgm` extension for fuzzy matching. Reserve regexp_like for filtering and data auditing queries.

`regexp_like(col, 'pattern')` is equivalent to `col ~ 'pattern'`. The `~` operator is available in all PostgreSQL versions and is often more concise in WHERE clauses. Use `~*` for case-insensitive matching (equivalent to `regexp_like(col, pattern, 'i')`). Migrate to `regexp_like` in new code for readability; keep `~` in hotpaths since it's the same underlying function.

regexp_match

PG 10+text[]

Returns the captured substrings of the first match of a POSIX regex pattern, or NULL if there is no match.

DeveloperData Eng

Signature

regexp_match ( string text, pattern text [, flags text ] ) → text[]

Parameters

ParameterTypeDescription
stringtextInput string to search
patterntextPOSIX regex with optional capturing groups
flagstextOptional: 'i' case-insensitive, 'n' newline-sensitive

Examples

Returns array of capture groups

sql
SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
{bar,beque}

Extract first capture group

sql
SELECT (regexp_match('2024-03-15', '(\d{4})-(\d{2})-(\d{2})'))[1] AS year;
2024

No match returns NULL, not empty array

sql
SELECT regexp_match('no match here', '\d+');
NULL
sql
SELECT (regexp_match(email, '@(.+)
    
  

))[1] AS domain FROM users;
example.com — domain extracted from email
sql
SELECT regexp_match('v14.2.1', 'v(\d+)\.(\d+)');
{14,2}
Anti-PatternSubscripting regexp_match result without checking for NULL first

`(regexp_match(col, 'pattern'))[1]` silently returns NULL when there is no match — which is fine in a SELECT, but causes silent data loss in an INSERT or UPDATE where you expected a value.

✓ Instead: Use `COALESCE((regexp_match(col, pattern))[1], '')` or add a WHERE filter: `WHERE regexp_match(col, pattern) IS NOT NULL` before relying on the result.

`(regexp_match(col, 'pattern'))[1]` returns NULL both when there is no match AND when there is a match with no capture group. Always wrap: `COALESCE((regexp_match(col, 'pattern'))[1], 'default')` or check `regexp_match(...) IS NOT NULL` before subscripting. Use `regexp_matches` (plural) with the 'g' flag to get all matches as a set of rows.

regexp_replace

PG 7.4+text

Replaces substrings matching a POSIX regex pattern with a replacement string; backreferences \1, \2, etc. reference capture groups.

DeveloperDBAData Eng

Signatures

regexp_replace ( string text, pattern text, replacement text ) → text
regexp_replace ( string text, pattern text, replacement text, flags text ) → text
regexp_replace ( string text, pattern text, replacement text, start integer, N integer ) → text

Parameters

ParameterTypeDescription
stringtextInput string
patterntextPOSIX regex pattern
replacementtextReplacement string; use \1, \2, \& for backreferences
flagstext'g' global (all matches), 'i' case-insensitive, 'n' newline-sensitive

Examples

sql
SELECT regexp_replace('Hello World', 'World', 'PostgreSQL');
Hello PostgreSQL

'g' flag replaces ALL matches

sql
SELECT regexp_replace('foo123bar456', '[0-9]+', 'N', 'g');
fooNbarN

Strip all non-digits from phone number

sql
SELECT regexp_replace('(800) 555-1234', '[^0-9]', '', 'g');
8005551234

Backreferences to swap words

sql
SELECT regexp_replace('John Smith', '(\w+)\s(\w+)', '\2, \1');
Smith, John
sql
SELECT regexp_replace(html, '<[^>]+>', '', 'g');
Plain text with HTML tags stripped
Anti-PatternForgetting the 'g' flag when intending to replace all occurrences

`regexp_replace(col, '[^a-zA-Z]', '', )` without the 'g' flag removes only the first non-letter character. Rows with multiple non-letter characters appear 'cleaned' but still contain unwanted characters — a bug that is easy to miss in testing on simple data.

✓ Instead: Always add the 'g' flag when you want all matches replaced: `regexp_replace(col, '[^a-zA-Z]', '', 'g')`.

`regexp_replace('aaa', 'a', 'b')` returns 'baa', not 'bbb'. Add 'g' for all-matches replacement. Common use: normalize phone numbers `regexp_replace(phone, '[^0-9]', '', 'g')`, strip HTML `regexp_replace(html, '<[^>]+>', '', 'g')`, redact emails `regexp_replace(text, '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', '[REDACTED]', 'g')`. The 'g' flag is the most commonly forgotten argument in production bugs.

regexp_split_to_array

PG 8.3+text[]

Splits the string using a POSIX regex as the delimiter and returns the parts as an array.

DeveloperData Eng

Signature

regexp_split_to_array ( string text, pattern text [, flags text ] ) → text[]

Parameters

ParameterTypeDescription
stringtextString to split
patterntextPOSIX regex delimiter pattern
flagstextOptional regex flags: 'i' case-insensitive

Examples

sql
SELECT regexp_split_to_array('hello world', '\s+');
{hello,world}

Empty strings preserved between consecutive delimiters

sql
SELECT regexp_split_to_array('one,two,,four', ',');
{one,two,,four}

Wait — trailing empty string included

sql
SELECT regexp_split_to_array('1a2b3c', '[a-z]');
{1,2,3,c}
sql
SELECT array_length(regexp_split_to_array(body, '\s+'), 1) AS word_count FROM articles;
Word count per article

Leading/trailing delimiter creates empty string elements

sql
SELECT regexp_split_to_array('  spaced  out  ', '\s+');
{"",spaced,out,""}
Anti-PatternUsing regexp_split_to_array for simple fixed-character splits

Using `regexp_split_to_array(col, ',')` when the delimiter is a plain literal character adds regex overhead without benefit. Special regex characters in the delimiter also need escaping, which is easy to forget.

✓ Instead: Use `string_to_array(col, ',')` for literal delimiters — it is faster, cleaner, and the delimiter is treated as a plain string, not a pattern.

Splitting `' hello world '` on `'\s+'` produces `{'', 'hello', 'world', ''}` — note the empty strings at start and end. Use `regexp_split_to_array(trim(col), '\s+')` to avoid them. For simple fixed-delimiter splits, `string_to_array` is faster. Use `regexp_split_to_array` only when you need regex power (e.g. split on any whitespace, split on multiple delimiter types).

regexp_split_to_table

PG 8.3+setof text

Splits the string using a POSIX regex as the delimiter and returns each part as a separate row.

DeveloperData Eng

Signature

regexp_split_to_table ( string text, pattern text [, flags text ] ) → setof text

Parameters

ParameterTypeDescription
stringtextString to split
patterntextPOSIX regex delimiter pattern
flagstextOptional regex flags

Examples

sql
SELECT regexp_split_to_table('hello world foo', '\s+');
hello world foo (one row each)
sql
SELECT * FROM regexp_split_to_table('a,b,,d', ',') AS t(word);
a b (empty) d
sql
SELECT word, count(*) FROM regexp_split_to_table(lower(body), '\W+') AS word GROUP BY word ORDER BY count DESC LIMIT 10;
Top 10 word frequencies in body text
sql
SELECT unnest(regexp_split_to_array(tags_text, ',\s*')) AS tag FROM posts;
Individual tags as rows
sql
SELECT trim(t) AS item FROM regexp_split_to_table('  a , b , c ', ',') AS t WHERE trim(t) != '';
a b c — CSV parsed and cleaned
Anti-PatternUsing regexp_split_to_table in a correlated subquery for every row in a large table

Running `regexp_split_to_table(col, pattern)` per row in a correlated subquery or via a LATERAL join on millions of rows can become a significant bottleneck since the regex is compiled and applied row by row.

✓ Instead: Process text splitting in batch: unnest into a staging table first, or use `string_to_array` with `unnest` when the delimiter is a plain literal. For heavy text analysis, consider a dedicated full-text search pipeline.

```sql SELECT word, count(*) AS freq FROM regexp_split_to_table(lower(document_text), '[^a-z]+') AS word WHERE length(word) > 2 GROUP BY word ORDER BY freq DESC LIMIT 20; ``` This gives you top-20 non-trivial words in a document entirely in SQL — useful for generating keyword tags, detecting spam patterns, or building a poor man's text index.

repeat

PG 8.0+text

Repeats the string the specified number of times.

Developer

Signature

repeat ( string text, number integer ) → text

Parameters

ParameterTypeDescription
stringtextString to repeat
numberintegerNumber of times to repeat; 0 or negative returns empty string

Examples

sql
SELECT repeat('Pg', 4);
PgPgPgPg
sql
SELECT repeat('*', 10);
**********
sql
SELECT repeat(' ', indent_level * 2) || label AS tree FROM nodes;
child_node (indented tree output)

Zero repetitions returns empty string

sql
SELECT repeat('a', 0);
sql
SELECT repeat(md5(random()::text), 5) AS long_token;
160-char pseudo-random string for test data
Anti-PatternUsing repeat() with a large count on user-supplied input without a size limit

`repeat('x', n)` where `n` comes from user input with no validation can allocate gigabytes of memory and crash the server. `repeat('x', 1000000000)` will attempt to allocate ~1GB.

✓ Instead: Always cap the repeat count: `repeat('x', LEAST(n, 10000))` or add a CHECK constraint or application-layer validation before passing user input to repeat().

`INSERT INTO test_table(data) SELECT repeat(md5(random()::text), 100) FROM generate_series(1, 10000)` populates 10,000 rows with ~3.2KB of random-looking data each. Use `repeat('x', n)` for a precisely sized string, or `repeat(md5(i::text), n)` for deterministic data keyed to row number. Eliminates the need for a Python/shell data generator for load testing.

replace

PG 8.0+text

Replaces all non-overlapping occurrences of from with to in the string.

DeveloperData Eng

Signature

replace ( string text, from text, to text ) → text

Parameters

ParameterTypeDescription
stringtextInput string
fromtextSubstring to find and replace (case-sensitive, literal — not a regex)
totextReplacement string (can be empty to delete)

Examples

sql
SELECT replace('Hello World', 'World', 'PostgreSQL');
Hello PostgreSQL

Replaces ALL occurrences

sql
SELECT replace('aabbaabb', 'aa', 'X');
XbbXbb
sql
SELECT replace(url, 'http://', 'https://') FROM links;
URLs upgraded to HTTPS
sql
SELECT replace(col, chr(9), '    ') FROM data;
Tabs replaced with 4 spaces
sql
SELECT replace(phone, '-', '') FROM contacts;
Phone with dashes removed
Anti-PatternChaining multiple replace() calls when translate() or regexp_replace() would work in one pass

`replace(replace(replace(col, 'é', 'e'), 'ñ', 'n'), 'ü', 'u')` applies three full passes over potentially large strings. As the number of substitutions grows, performance degrades linearly.

✓ Instead: Use `translate(col, 'éñü', 'enu')` for single-character substitutions — one pass regardless of how many character mappings. For substring replacements (length > 1), use `regexp_replace` with alternation: `regexp_replace(col, 'foo|bar', '', 'g')`.

`replace(replace(replace(col, 'a', 'x'), 'b', 'y'), 'c', 'z')` is three sequential passes. `translate(col, 'abc', 'xyz')` does the same in one pass and is significantly faster on large strings. Use `replace` when replacing substrings (length > 1). Use `translate` when doing character-for-character substitution. Use `regexp_replace` when you need pattern matching.

reverse

PG 8.0+text

Returns the characters of the string in reverse order.

Developer

Signature

reverse ( string text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to reverse

Examples

sql
SELECT reverse('hello');
olleh

Not a palindrome

sql
SELECT reverse('abcde') = 'abcde';
false
sql
SELECT col = reverse(col) AS is_palindrome FROM words;
true/false per row

Unicode-aware — reverses characters not bytes

sql
SELECT reverse('日本語');
語本日
sql
SELECT * FROM domains ORDER BY reverse(domain);
Rows sorted by reversed domain (groups by TLD then SLD)
Anti-PatternUsing reverse() for suffix-pattern matching without a supporting index

`WHERE reverse(col) LIKE 'moc.elpmaxe.%'` can use an index on `reverse(col)`, but developers often write this in ad-hoc queries without creating the functional index, resulting in a full sequential scan.

✓ Instead: Create `CREATE INDEX ON domains (reverse(domain))` before using `WHERE reverse(domain) LIKE reverse('%.' || $1)` in production queries. Without the index, reverse-based suffix matching is no faster than `col LIKE '%suffix'`.

`ORDER BY reverse(domain)` sorts 'mail.example.com', 'api.example.com', and 'www.example.com' together because reversed they share the same prefix 'moc.elpmaxe.'. This enables B-tree index scans on suffix patterns — combine with `CREATE INDEX ON domains (reverse(domain))` and query `WHERE reverse(domain) LIKE reverse('%.example.com')` for fast suffix lookups without full-text search.

rpad

PG 8.0+text

Right-pads the string to the specified length by appending copies of the fill string (default: space).

DeveloperData Eng

Signatures

rpad ( string text, length integer ) → text
rpad ( string text, length integer, fill text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to pad
lengthintegerTarget total length; truncates if string is longer
filltextPadding string (default: single space)

Examples

sql
SELECT rpad('hello', 10);
hello

Custom fill character

sql
SELECT rpad('hello', 10, '.');
hello.....
sql
SELECT rpad(label, 20, ' ') || value FROM settings;
label value (tabular output)

Truncated when string exceeds length

sql
SELECT rpad('toolongstring', 5);
toolon
sql
SELECT rpad(col, max_len, ' ') FROM fixed_width_export;
Space-padded fixed-width fields for legacy exports
Anti-PatternForgetting that rpad truncates strings longer than the target length

`rpad('toolongstring', 5)` returns `'toolon'` — data is silently truncated, not an error. When generating fixed-width export files, any value longer than the field width will be cut off without warning, corrupting the record layout.

✓ Instead: Add an assertion or validation step before rpad in ETL pipelines: `WHERE length(col) <= 20` or raise an exception in PL/pgSQL when any value exceeds the column width.

Legacy system integrations often require fixed-width text files: `SELECT rpad(last_name, 20) || rpad(first_name, 15) || lpad(amount::text, 10, '0') FROM records`. This creates properly formatted fixed-width output entirely in SQL — no application-layer string building needed. Faster than constructing records in application code when exporting millions of rows.

rtrim

PG 8.0+text

Removes the longest string of characters in the characters set from the end (right side) of the string.

DeveloperDBAData Eng

Signatures

rtrim ( string text ) → text
rtrim ( string text, characters text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to right-trim
characterstextCharacter set to remove (default: space)

Examples

sql
SELECT rtrim('hello   ');
hello
sql
SELECT rtrim('hello...', '.');
hello

Strip trailing zeros

sql
SELECT rtrim('123000', '0');
123
sql
SELECT rtrim('path/to/dir/', '/');
path/to/dir
sql
SELECT rtrim(rpad(col, 100), ' ') FROM char_columns;
Trim CHAR column padding
Anti-PatternUsing rtrim to strip trailing zeros from numeric strings before arithmetic

`rtrim('3.14000', '0')` returns `'3.14'` as expected, but `rtrim('100', '0')` returns `'1'` — stripping significant zeros too. The characters argument is a set, not a suffix, so it removes any trailing character that is in the set.

✓ Instead: For stripping trailing decimal zeros from numeric strings, cast to numeric and back: `trim(trailing '0' from col::numeric::text)` is safer, or use `to_char(val::numeric, 'FM999999990.##########')` for formatting.

A `CHAR(20)` column storing 'hello' actually stores 'hello ' (padded with 15 spaces). Comparing it with `= 'hello'` works in SQL but if you ever extract it as text and compare in application code, the trailing spaces will cause mismatches. Add `rtrim(col)` in your SELECT, or better yet: avoid CHAR and use VARCHAR or TEXT instead.

split_part

PG 8.0+ (negative n from PG 14+)text

Splits the string on the delimiter and returns the nth field (1-based); returns empty string for out-of-range n.

DeveloperData Eng

Signature

split_part ( string text, delimiter text, n integer ) → text

Parameters

ParameterTypeDescription
stringtextInput string to split
delimitertextLiteral delimiter string (not a regex)
ninteger1-based field index; negative values count from the right (PG 14+)

Examples

sql
SELECT split_part('one,two,three', ',', 2);
two

Extract domain from email

sql
SELECT split_part('user@example.com', '@', 2);
example.com

Out-of-bounds returns empty string, not NULL

sql
SELECT split_part('a,b', ',', 5);
sql
SELECT split_part('14.2.1', '.', 1)::int AS major, split_part('14.2.1', '.', 2)::int AS minor;
14 | 2

Negative index from PG 14

sql
SELECT split_part(path, '/', -1) FROM file_paths;
Last path segment (filename) — PG 14+
Anti-PatternUsing split_part to parse CSV fields that may contain quoted delimiters

`split_part('"Smith, John",30', ',', 1)` returns `'"Smith'` — not `'"Smith, John"'`. split_part does not understand CSV quoting rules and splits on every literal delimiter character.

✓ Instead: For proper CSV parsing with quoted fields, use PostgreSQL's COPY command, the `csv_fdw` extension, or parse in application code. split_part is safe only for delimiter-separated values where the delimiter cannot appear in field values.

Unlike `(string_to_array(col, ','))[5]` which returns NULL for out-of-bounds, `split_part(col, ',', 5)` returns '' (empty string). This matters in CHECK constraints and WHERE clauses where you don't want NULL propagation. In PG 14+, `split_part(col, '/', -1)` extracts the last segment of a path — previously you needed `regexp_replace(col, '.*/(.+)

#x27;, '\1')` for this.

starts_with

PG 9.6+boolean

Returns true if the string begins with the given prefix.

Developer

Signature

starts_with ( string text, prefix text ) → boolean

Parameters

ParameterTypeDescription
stringtextInput string to check
prefixtextPrefix to test for

Examples

sql
SELECT starts_with('hello world', 'hello');
true
sql
SELECT starts_with('hello world', 'world');
false

Case-sensitive

sql
SELECT starts_with('Hello', 'hello');
false
sql
SELECT * FROM files WHERE starts_with(path, '/home/user/');
Files under /home/user/

Empty prefix always matches

sql
SELECT starts_with(col, '');
true
Anti-PatternUsing starts_with() for case-insensitive prefix checks without lowercasing both sides

`starts_with(col, 'IMG')` does not match rows where col starts with 'img' or 'Img'. This is easy to miss when data comes from mixed-case sources.

✓ Instead: For case-insensitive prefix checks: `starts_with(lower(col), lower('IMG'))` or `col ILIKE 'IMG%'`. Create a functional index on `lower(col)` if performance matters.

`WHERE col LIKE 'prefix%'` can use a standard B-tree index. `WHERE starts_with(col, prefix)` can too, but only if the query planner recognizes the pattern — which it reliably does. Both are equivalent in practice for literal prefixes. However, for parameterized queries: `WHERE col LIKE $1 || '%'` vs `WHERE starts_with(col, $1)` — the `starts_with` form is cleaner and the planner handles both. Avoid `WHERE left(col, n) = prefix` — it can prevent index use.

string_to_array

PG 9.0+text[]

Splits the string at each delimiter occurrence and returns the parts as a text array; an optional null_string specifies which field values should be replaced with NULL.

DeveloperData Eng

Signature

string_to_array ( string text, delimiter text [, null_string text ] ) → text[]

Parameters

ParameterTypeDescription
stringtextInput string to split
delimitertextLiteral delimiter; if NULL, string is split into individual characters
null_stringtextOptional: field values equal to this string are returned as NULL

Examples

sql
SELECT string_to_array('a,b,c,d', ',');
{a,b,c,d}

Empty fields become NULL

sql
SELECT string_to_array('a,,c', ',', '');
{a,NULL,c}

NULL delimiter splits into individual characters

sql
SELECT string_to_array('hello', NULL);
{h,e,l,l,o}
sql
SELECT unnest(string_to_array(tags, ',')) AS tag FROM posts;
Individual tag rows from comma-separated string

Cast to integer array after splitting

sql
SELECT string_to_array('1:2:3', ':')::integer[];
{1,2,3}
Anti-PatternUsing string_to_array on a column that should be stored as a native array

If you routinely split a text column with string_to_array to use its elements — filtering, joining, or unnesting — you've found a schema design problem. Splitting on every query adds overhead and prevents efficient GIN indexing.

✓ Instead: Migrate the column to a native `text[]` array type. Use GIN indexes for containment and overlap queries (`&&`, `@>`). The data becomes a first-class array, queryable without parsing on every access.

`SELECT p.id, t.tag FROM posts p, unnest(string_to_array(p.tags, ',')) AS t(tag)` unnests the CSV tags column into individual rows that can be JOINed against a tags table. This is faster and simpler than a correlated subquery for normalizing denormalized CSV columns. For proper multi-value columns, migrate to actual arrays (`text[]`) and use the `&&` and `@>` operators with GIN indexes.

string_to_table

PG 14+setof text

Splits the string at each delimiter and returns each part as a separate row in a set.

DeveloperData Eng

Signature

string_to_table ( string text, delimiter text [, null_string text ] ) → setof text

Parameters

ParameterTypeDescription
stringtextInput string to split
delimitertextLiteral delimiter; NULL splits into individual characters
null_stringtextOptional: field values matching this string are returned as NULL

Examples

sql
SELECT * FROM string_to_table('one,two,three', ',');
one two three (one row each)
sql
SELECT * FROM string_to_table('a,,c', ',', '');
a NULL c
sql
SELECT * FROM string_to_table('hello', NULL);
h e l l o
sql
SELECT trim(t) FROM string_to_table('  a , b , c ', ',') AS t WHERE trim(t) <> '';
a b c
sql
SELECT word, length(word) FROM string_to_table('the quick brown fox', ' ') AS word ORDER BY length DESC;
Words ordered by length
Anti-PatternUsing string_to_table on PG versions before 14 without a fallback

`string_to_table` was added in PostgreSQL 14. Using it without a version check will raise `ERROR: function string_to_table(text, text) does not exist` on older servers, which is a runtime error that only surfaces in production.

✓ Instead: For pre-PG14 compatibility, use `unnest(string_to_array(col, ','))` which works from PG 9.0+. Add a comment noting when you can drop the workaround once your minimum supported version is PG 14.

`string_to_table(col, ',')` is cleaner and slightly more efficient than `unnest(string_to_array(col, ','))` — it skips the intermediate array allocation. The set-returning form also integrates better with LATERAL joins: `FROM posts p, string_to_table(p.tags, ',') AS tag` is perfectly valid. For pre-PG14 compatibility, keep using the unnest pattern.

substr

PG 8.0+text

Extracts a substring starting at position start (1-based) for count characters; equivalent to the substring() function.

Developer

Signatures

substr ( string text, start integer ) → text
substr ( string text, start integer, count integer ) → text

Parameters

ParameterTypeDescription
stringtextInput string
startintegerStarting position (1-based); values <= 0 are treated as 1
countintegerNumber of characters to extract; if omitted, extracts to end of string

Examples

From position 8 to end

sql
SELECT substr('Hello, World!', 8);
World!
sql
SELECT substr('Hello, World!', 8, 5);
World

Start <= 0 treated as 1 (but count adjusted)

sql
SELECT substr('abcdef', 0, 4);
abc
sql
SELECT substr(md5(id::text), 1, 8) AS short_hash FROM items;
8-char hash prefix per row
sql
SELECT substr(isbn, 4, 9) AS publisher_code FROM books;
Middle segment of ISBN
Anti-PatternUsing substr(col, 0, n) expecting n characters but getting n-1

`substr('hello', 0, 4)` returns `'hel'` — 3 characters, not 4. Position 0 is treated as 1, but the count is also adjusted, so you get one fewer character than you might expect. This off-by-one trips up developers who port code from languages where string slicing is 0-indexed.

✓ Instead: Always use 1-based positions: `substr(col, 1, 3)` clearly returns the first 3 characters. Or use `left(col, 3)` which is unambiguous.

`substr('hello', 0, 4)` returns 'hel' (3 chars). PostgreSQL treats position 0 as before position 1 and adjusts count accordingly. This is SQL-standard behavior but confusing. Stick to 1-based indexing: `substr('hello', 1, 3)` unambiguously returns 'hel'. Use `left(col, n)` for extracting the first n characters — it's the clearest of all three options.

substring

PG 8.0+text

Extracts a substring by position, POSIX regex capture, or SQL regex pattern.

Developer

Signatures

substring ( string text [ FROM start integer ] [ FOR count integer ] ) → text
substring ( string text FROM pattern text ) → text
substring ( string text SIMILAR pattern text ESCAPE escape text ) → text

Parameters

ParameterTypeDescription
stringtextInput string
startintegerStarting position (1-based)
countintegerLength of substring
patterntextPOSIX regex; the part in the first capture group () is returned

Examples

sql
SELECT substring('Thomas' FROM 2 FOR 3);
hom
sql
SELECT substring('Thomas' FROM 3);
omas

POSIX regex — returns matched portion

sql
SELECT substring('foobar' FROM 'o+b');
oob

Capture group extracts year

sql
SELECT substring('2024-03-15' FROM '^(\d{4})-');
2024
sql
SELECT substring(email FROM '@(.*)
    
  

) FROM users;
example.com — domain portion of email
Anti-PatternUsing substring(col FROM pattern) expecting all capture groups to be returned

`substring('2024-03-15' FROM '(\d{4})-(\d{2})-(\d{2})')` returns only `'2024'` — the first capture group. Developers expecting all three groups are returned get only the year and silently lose month and day.

✓ Instead: Use `regexp_match('2024-03-15', '(\d{4})-(\d{2})-(\d{2})')` which returns `{2024,03,15}` — all capture groups as an array.

`substring('foobar' FROM 'o+b')` returns 'oob' — the full match. To return only part of the match, wrap the desired part in parentheses: `substring('foobar' FROM 'o+(b)')` returns 'b'. This is the key difference from `regexp_match()`. If your pattern has multiple capture groups, only the first one is returned — use `regexp_match()` for multi-group extraction.

to_hex

PG 8.0+text

Converts an integer to its hexadecimal string representation (lowercase, no '0x' prefix).

Developer

Signatures

to_hex ( number integer ) → text
to_hex ( number bigint ) → text

Parameters

ParameterTypeDescription
numberinteger | bigintInteger value to convert to hex

Examples

sql
SELECT to_hex(255);
ff
sql
SELECT to_hex(256);
100

Add 0x prefix manually if needed

sql
SELECT '0x' || to_hex(255);
0xff
sql
SELECT '#' || lpad(to_hex(r*65536 + g*256 + b), 6, '0') FROM colors;
#ff8040 — CSS hex color from RGB integers
sql
SELECT lpad(to_hex(id), 8, '0') AS hex_id FROM items;
00000042 — zero-padded hex ID
Anti-PatternUsing to_hex without lpad and then comparing hex strings of different lengths

`to_hex(255)` = `'ff'` (2 chars) but `to_hex(256)` = `'100'` (3 chars). Storing or comparing raw to_hex output as identifiers or sort keys produces incorrect lexicographic ordering and mismatched string lengths.

✓ Instead: Always pad to a consistent width: `lpad(to_hex(val), 8, '0')` for 32-bit values, or `lpad(to_hex(val), 16, '0')` for 64-bit. This ensures correct string comparison and fixed-width output.

`'#' || lpad(to_hex(red * 65536 + green * 256 + blue), 6, '0')` converts three 0-255 integer channels to a CSS hex color string. The `lpad` with '0' ensures colors like `#00ff00` don't become `#ff00` when channels are zero. Useful when color data is stored as separate R/G/B columns and you need to export or display as hex.

translate

PG 8.0+text

Replaces each character in string that matches a character in from with the corresponding character in to; characters with no corresponding to character are deleted.

DeveloperDBAData Eng

Signature

translate ( string text, from text, to text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to transform
fromtextCharacters to match (each character individually)
totextReplacement characters (shorter than from means extras are deleted)

Examples

'4' has no replacement → deleted; '1'→'a', '3'→'x'

sql
SELECT translate('12345', '143', 'ax');
ax25

Replace all vowels

sql
SELECT translate('Hello World', 'aeiouAEIOU', '**********');
H*ll* W*rld
sql
SELECT translate(col, '0123456789', '');
String with all digits removed
sql
SELECT translate(lower(input), 'àáâäåæçèéêëìíîïñòóôöùúûüý', 'aaaaaaaceeeeiiiinooooouuuuy') FROM data;
Accent-normalized string
sql
SELECT translate(text,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm');
ROT13 encoded text
Anti-PatternConfusing translate() with replace() and expecting substring matching

`translate('hello', 'el', 'ip')` returns `'hippo'` — it maps 'e'→'i' and 'l'→'p' character by character. Developers expecting `'el'` to be treated as a literal substring (like replace) are surprised when individual 'e' and 'l' characters are substituted throughout the string.

✓ Instead: Use `replace(col, 'from_substring', 'to_substring')` when you need to substitute a multi-character literal. Use `translate` only for character-to-character one-for-one mapping.

`translate(lower(col), 'àáâäçèéêëìíîïñòóôöùúûü', 'aaaceeeeiiiinoooouuuu')` normalizes accented Latin characters for ASCII-safe searching — faster than regex for character-level substitutions because it's a single O(n) pass. Also implements ROT13 for light obfuscation. When `to` is shorter than `from`, extra `from` characters are simply deleted — use `translate(col, '0123456789', '')` to strip all digits.

trim

PG 8.0+text

Removes the specified characters from the start, end, or both ends of a string; defaults to removing spaces from both ends.

DeveloperDBAData Eng

Signatures

trim ( [ LEADING | TRAILING | BOTH ] [ characters text ] FROM string text ) → text
trim ( [ LEADING | TRAILING | BOTH ] [ FROM ] string text [, characters text ] ) → text

Parameters

ParameterTypeDescription
LEADING|TRAILING|BOTHkeywordWhich end(s) to trim; BOTH is the default
characterstextSet of characters to remove (default: space)
stringtextInput string to trim

Examples

Default: trim spaces from both ends

sql
SELECT trim('  hello  ');
hello
sql
SELECT trim(BOTH 'x' FROM 'xxxhelloxxx');
hello
sql
SELECT trim(LEADING '0' FROM '000123');
123
sql
SELECT trim(TRAILING '.' FROM 'end...');
end
sql
SELECT trim(BOTH FROM '   spaces   ');
spaces
Anti-PatternTrimming user input only at read time in queries instead of at write time

Wrapping every query with `WHERE trim(col) = $1` or `SELECT trim(col)` is wasteful — the trim executes on every read across all rows. If the column always stores untrimmed data, every consumer must remember to trim.

✓ Instead: Enforce trimming at write time: add a CHECK constraint `CHECK (col = trim(col))` or a BEFORE INSERT/UPDATE trigger that normalizes the value. The column then always stores clean data and queries need no wrapper.

`trim(BOTH FROM col)` = `btrim(col)`. `trim(LEADING FROM col)` = `ltrim(col)`. `trim(TRAILING FROM col)` = `rtrim(col)`. Use whichever reads most naturally in context. The SQL standard form `trim(BOTH '.' FROM col)` is portable across databases; the function forms `btrim/ltrim/rtrim` are PostgreSQL-specific but more concise in application code. All behave identically — the characters argument is a SET, not a substring.

upper

PG 8.0+text

Converts all characters in the string to their upper-case equivalents using the current locale.

DeveloperDBA

Signature

upper ( string text ) → text

Parameters

ParameterTypeDescription
stringtextInput string to convert to upper case

Examples

sql
SELECT upper('hello');
HELLO
sql
SELECT upper('PostgreSQL');
POSTGRESQL
sql
SELECT upper(NULL);
NULL
sql
SELECT upper(status) = 'ACTIVE' FROM users;
Case-insensitive status check
sql
SELECT upper(country_code) FROM addresses WHERE length(country_code) = 2;
Normalized ISO country codes
Anti-PatternCalling upper() in every WHERE clause instead of normalizing at insert time

`WHERE upper(status) = 'ACTIVE'` prevents index use on `status` and runs upper() on every row in the table on every query. If `status` is supposed to be uppercase, callers should not be responsible for normalizing it at read time.

✓ Instead: Store status values in a consistent case enforced at write time (`CHECK (status = upper(status))`). Then `WHERE status = 'ACTIVE'` uses a plain B-tree index with no function call overhead.

Calling `upper()` at read time on every query is wasteful. Instead, enforce case at write time: `CHECK (status = upper(status))` or use a trigger `NEW.country_code := upper(NEW.country_code)`. This ensures the column always contains uppercase values, making WHERE queries faster (no function call needed) and allowing standard B-tree indexes to be used without functional indexes.

unistr

PG 9.0+text

Evaluates escape sequences in the string to produce a Unicode text value; \XXXX or \+XXXXXX represent Unicode code points in hex.

DeveloperDBA

Signature

unistr ( string text ) → text

Parameters

ParameterTypeDescription
stringtextString with Unicode escape sequences in \XXXX or \+XXXXXX format

Examples

\0061 = 'a', \0074 = 't'

sql
SELECT unistr('d\0061t\0061');
data

Euro sign from hex code point

sql
SELECT unistr('\20AC');

\0020 = space

sql
SELECT unistr('Hello\0020World');
Hello World

Emoji via 6-digit supplementary code point

sql
SELECT unistr('\+01F600');
😀
sql
SELECT unistr('caf\00E9');
café
Anti-PatternConfusing unistr 4-digit escape format with standard SQL or Python Unicode escapes

`unistr('\u20AC')` does NOT work — the format requires `\20AC` (no `u`). Developers copy-pasting escape sequences from JavaScript (`\u20AC`) or Python (`\u20ac`) into unistr get an error or wrong output.

✓ Instead: Use `unistr('\20AC')` for BMP characters (U+0000 to U+FFFF) and `unistr('\+01F600')` for supplementary characters (above U+FFFF). No `u` prefix — just the bare hex digits after the backslash.

When writing SQL migration scripts that need to insert emoji, accented characters, or special symbols, `unistr('\+01F680')` (🚀) is more robust than pasting the literal character — some editors/terminals mangle multibyte characters during copy-paste. Use the hex code point form for any character above U+00FF in scripts that cross encoding boundaries.

normalize

PG 13+text

Normalizes a string to the specified Unicode normalization form (NFC, NFD, NFKC, or NFKD); defaults to NFC.

DeveloperDBA

Signature

normalize ( string text [, form ] ) → text

Parameters

ParameterTypeDescription
stringtextInput string to normalize
formkeywordNFC (default), NFD, NFKC, or NFKD normalization form

Examples

NFC: precomposed and decomposed 'é' compare equal

sql
SELECT normalize(U&'\00E9') = normalize(U&'e\0301');
true
sql
SELECT normalize('café', NFC);
café (precomposed form)

Ligature decomposed to two characters

sql
SELECT normalize('fi', NFKC);
fi

Superscript 2 maps to regular digit 2

sql
SELECT normalize('²', NFKC);
2

NFC combines, NFD keeps separate

sql
SELECT length(normalize('e\u0301', NFC)), length(normalize('e\u0301', NFD));
1 | 2
Anti-PatternComparing Unicode strings for equality without normalizing both sides first

Two strings that look visually identical — such as 'café' composed two different ways — will not compare equal in PostgreSQL if they use different Unicode normalization forms. `'café' = 'café'` can return false when one uses U+00E9 and the other uses e + U+0301.

✓ Instead: Normalize at write time so all stored values use the same form: `CHECK (col = normalize(col, NFC))`. Then comparisons are reliable without normalizing at query time.

'é' can be stored as a single precomposed codepoint (U+00E9) OR as 'e' + combining accent (U+0065 U+0301). These look identical but are byte-different — they will violate a UNIQUE constraint silently if one form is inserted after the other. Normalize at write time: `CHECK (username = normalize(username, NFC))` or in a trigger. NFKC also collapses ligatures (fi→fi) and compatibility forms, which is usually what you want for user-facing identifiers.

parse_ident

PG 10+text[]

Parses a qualified SQL identifier (e.g. schema.table or "My Schema"."My Table") into an array of identifier parts, handling quoting correctly.

DBADeveloper

Signature

parse_ident ( str text [, strict_mode boolean DEFAULT true ] ) → text[]

Parameters

ParameterTypeDescription
strtextQualified identifier string to parse
strict_modebooleanIf true (default), extra trailing characters cause an error; if false, they are silently ignored

Examples

sql
SELECT parse_ident('public.users');
{public,users}

Double quotes stripped, case preserved

sql
SELECT parse_ident('"My Schema"."My Table"');
{My Schema,My Table}
sql
SELECT parse_ident('schema.table')[1] AS schema_name;
schema

Unquoted identifiers are lowercased

sql
SELECT parse_ident('SomeTable', false);
{sometable}
sql
SELECT parse_ident('pg_catalog.pg_class');
{pg_catalog,pg_class}
Anti-PatternUsing split_part('.') to extract schema and table name from a qualified identifier

`split_part('"schema.with.dots".table', '.', 1)` returns `'"schema'` — it splits on every dot, including dots inside quoted identifier names. Any schema or table name containing a dot will be split incorrectly.

✓ Instead: Use `parse_ident(qualified_name)` which correctly handles quoting: `parse_ident('"schema.with.dots".table')[1]` returns `'schema.with.dots'` as intended.

`split_part('"schema.with.dots".table', '.', 1)` returns `'"schema'` — broken. `parse_ident('"schema.with.dots".table')[1]` correctly returns `'schema.with.dots'`. Whenever processing fully-qualified identifiers from pg_catalog views (like `table_schema || '.' || table_name`), use `parse_ident` to handle edge cases with quoted names containing special characters.

regexp_substr

PG 15+text

Returns the substring matching the Nth occurrence of a POSIX regex pattern, optionally returning a specific capture group.

DeveloperData Eng

Signature

regexp_substr ( string text, pattern text [, start integer [, N integer [, flags text [, subexpr integer ] ] ] ] ) → text

Parameters

ParameterTypeDescription
stringtextInput string
patterntextPOSIX regex pattern
startintegerStart position (default 1)
NintegerWhich occurrence to return (default 1)
flagstextRegex flags
subexprintegerCapture group to return (0 = whole match)

Examples

First match

sql
SELECT regexp_substr('one 1 two 2 three 3', '\d+');
1

Second match

sql
SELECT regexp_substr('one 1 two 2 three 3', '\d+', 1, 2);
2

Second occurrence of digits

sql
SELECT regexp_substr('2024-03-15', '(\d+)', 1, 2);
03

Second capture group = domain

sql
SELECT regexp_substr('user@example.com', '([^@]+)@(\S+)', 1, 1, '', 2);
example.com
sql
SELECT regexp_substr(col, '[A-Z]{2}\d{6}', 1, 1, 'i') FROM docs;
First document reference code
Anti-PatternUsing regexp_substr on PostgreSQL versions before 15 without a compatibility fallback

`regexp_substr` was added in PostgreSQL 15. Using it on PG 14 or earlier raises `ERROR: function regexp_substr(text, text) does not exist` at runtime — not at deploy time, so it may only surface in production when the code path is hit.

✓ Instead: For pre-PG15 compatibility, use `(regexp_match(col, pattern))[1]` for first-match extraction. For Nth-match extraction, use `regexp_matches` with a row_number() CTE. Document the PG15+ requirement clearly if you choose to use regexp_substr.

`regexp_substr(col, pattern)` is equivalent to `(regexp_match(col, pattern))[1]` for simple first-match extraction. The power of `regexp_substr` is the N parameter (Nth match) — previously requiring `regexp_matches` + row_number() + a subquery. For extracting the 3rd IP address in a log line: `regexp_substr(col, '\d+\.\d+\.\d+\.\d+', 1, 3)` replaces a multi-step CTE.