🔤

PostgreSQL String Functions

Complete reference for PostgreSQL string functions covering text manipulation, pattern matching, encoding, and string formatting. Every function includes syntax, parameters, real-world SQL examples, anti-pattern warnings, and pro tips. Updated for PostgreSQL 16.

53 functions

What are PostgreSQL String Functions?

PostgreSQL string functions are built-in functions that operate on text data. They handle tasks including trimming whitespace, extracting substrings, splitting strings, encoding conversions, and regular expression matching. Common PostgreSQL string functions include concat(), substring(), regexp_replace(), trim(), format(), and split_part().

ascii

PG 8.0+→ integer

Returns the numeric Unicode code point of the first character of the string.

DeveloperDBA

Signature

ascii ( string text ) → integer

Parameters

Parameter	Type	Description
string	text	Input string; only the first character is evaluated

Examples

Uppercase A

sql

SELECT ascii('A');

→65

Lowercase a

sql

SELECT ascii('a');

→97

Unicode euro sign

sql

SELECT ascii('€');

→8364

Only the first character is used

sql

SELECT ascii('Hello');

→72

Manual lowercase-to-uppercase via code point math

sql

SELECT chr(ascii('a') - 32);

→A

⚠Anti-Pattern— Looping ascii() per character to validate a string

Calling ascii() in a loop or for each character to check whether a string is pure ASCII or contains only certain characters is slow and verbose. A single `regexp_replace` or `translate` call handles the whole string in one pass.

✓ Instead: Use `col ~ '[^\x20-\x7E]'` to detect non-ASCII characters, or `regexp_replace(col, '[^\x20-\x7E]', '', 'g')` to strip them — no loop needed.

Use `WHERE ascii(left(col, 1)) < 32` to find rows that start with invisible control characters — null bytes (0), tabs (9), carriage returns (13), escape codes — that silently break CSV exports, JSON serialization, or downstream string comparisons. Pair with `regexp_replace(col, '[\x00-\x1F]', '', 'g')` to strip them.

example

SELECT id, col FROM users WHERE ascii(left(col,1)) < 32;

→Rows with invisible leading characters

bit_length

PG 8.0+→ integer

Returns the number of bits in the string (8 × number of bytes, regardless of encoding).

DeveloperDBA

Signatures

bit_length ( string text ) → integer

bit_length ( string bytea ) → integer

Parameters

Parameter	Type	Description
string	text \| bytea	Input string or binary data

Examples

5 ASCII chars × 8 bits

sql

SELECT bit_length('hello');

→40

Euro sign is 3 bytes in UTF-8

sql

SELECT bit_length('€');

→24

sql

SELECT bit_length('');

→0

sql

SELECT bit_length('A'::bytea);

→8

Compute byte size of text columns

sql

SELECT bit_length(col) / 8 AS byte_count FROM messages LIMIT 3;

→byte_count per row

⚠Anti-Pattern— Using bit_length to check byte limits on user-facing fields

Developers sometimes use `bit_length(col) / 8` to compute a byte count and enforce storage limits. This is needlessly indirect. `octet_length(col)` returns bytes directly and is the correct tool for enforcing byte-level constraints such as network protocol limits or column storage checks.

✓ Instead: Use `CHECK (octet_length(col) <= 1024)` instead of `CHECK (bit_length(col) / 8 <= 1024)`.

bit_length is rarely what you want in practice. For checking if a string exceeds a byte limit (e.g. for a VARCHAR(n) or network packet), use `octet_length(col) <= 1024`. bit_length just multiplies by 8 — the only time it's distinctly useful is when working with actual bit-string types (BIT, BIT VARYING).

example

SELECT bit_length(email) FROM users WHERE bit_length(email) > 200;

→Rows where email exceeds 200 bits (25 bytes)

btrim

PG 8.0+→ text

Removes the longest string of characters in the characters set from both ends of the string; defaults to removing spaces.

DeveloperDBAData Eng

Signatures

btrim ( string text ) → text

btrim ( string text, characters text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to trim
characters	text	Set of characters to remove (default: space)

Examples

Default: trim spaces

sql

SELECT btrim('  hello  ');

→hello

Trim specific character

sql

SELECT btrim('xxhelloxx', 'x');

→hello

sql

SELECT btrim('###hello###', '#');

→hello

Characters argument is a SET, not a string prefix/suffix

sql

SELECT btrim('xyxhelloyx', 'xy');

→hello

All-whitespace returns empty string

sql

SELECT btrim('   ');

→

⚠Anti-Pattern— Trimming in a WHERE clause on an unindexed column

Using `WHERE btrim(col) = 'value'` prevents index use — PostgreSQL cannot use a plain B-tree index on `col` when a function wraps it. Every row is scanned and trimmed at runtime.

✓ Instead: Create a functional index `CREATE INDEX ON t (btrim(col))` and query with `WHERE btrim(col) = 'value'`, or better yet store pre-trimmed values by applying btrim at write time via a CHECK constraint or trigger.

`btrim('abchello', 'abc')` removes any combination of a, b, and c from both ends — not just the literal prefix 'abc'. So `btrim('cabcXcab', 'abc')` returns 'X'. This surprises developers expecting prefix/suffix matching. For exact prefix/suffix stripping, use `regexp_replace(col, '^prefix|suffix

#x27;, '', 'g')` instead.

example

SELECT btrim(code, '0') FROM product_codes WHERE type = 'numeric';

→'007' → '7', '001000' → '1'

char_length

PG 8.0+→ integer

Returns the number of characters (code points) in the string, not the byte count.

Developer

Signatures

char_length ( string text ) → integer

character_length ( string text ) → integer

Parameters

Parameter	Type	Description
string	text	Input string

Examples

sql

SELECT char_length('hello');

→5

sql

SELECT char_length('');

→0

4 characters, even though 'é' is multi-byte in UTF-8

sql

SELECT char_length('café');

→4

NULL input returns NULL — use COALESCE if you need 0

sql

SELECT char_length(NULL);

→NULL

3 Unicode characters regardless of byte length

sql

SELECT char_length('日本語');

→3

⚠Anti-Pattern— Using char_length to enforce byte-level storage or protocol limits

A VARCHAR(255) limit on a column is a character limit, but external systems such as Kafka, HTTP headers, or SMS gateways have byte limits. `char_length('emoji😀') = 6` but `octet_length('emoji😀') = 9`. Validating against char_length will let oversized payloads through when users include emoji or CJK characters.

✓ Instead: Use `octet_length(col) <= N` for byte-budget checks, and `char_length(col) <= N` only for display/UX character counts.

All three look the same for pure ASCII but diverge with multibyte characters. `char_length('café')` = 4 (character count). `length('café')` = 4 (same for text). `octet_length('café')` = 5 (UTF-8 bytes — 'é' is 2 bytes). Use `char_length` for UI display limits, `octet_length` for storage/network byte limits. `length(bytea_col)` gives byte count of binary data.

example

SELECT id, char_length(bio) AS bio_len FROM users WHERE char_length(bio) > 500 ORDER BY bio_len DESC;

→IDs and bio lengths for verbose users

Common Gotchas

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

chr

PG 8.0+→ text

Returns the character with the given Unicode code point.

DeveloperDBA

Signature

chr ( code integer ) → text

Parameters

Parameter	Type	Description
code	integer	Unicode code point (0–1114111)

Examples

sql

SELECT chr(65);

→A

Line feed — common in dynamic string building

sql

SELECT chr(10);

→(newline character)

sql

SELECT chr(9);

→(tab character)

Euro sign

sql

SELECT chr(8364);

→€

Generate entire alphabet

sql

SELECT string_agg(chr(generate_series + 64), '') FROM generate_series(1,26);

→ABCDEFGHIJKLMNOPQRSTUVWXYZ

⚠Anti-Pattern— Embedding literal newlines or tabs directly in SQL string literals

Pasting literal newline or tab characters into a SQL string literal makes the query unreadable and fragile across editors and terminals that may silently strip or convert whitespace.

✓ Instead: Use `chr(10)` for LF, `chr(13)` for CR, and `chr(9)` for tab. For CRLF: `chr(13) || chr(10)`. This keeps SQL scripts readable and encoding-safe.

Uploaded text files often have Windows CRLF endings. Strip them in one shot: `regexp_replace(col, chr(13)||chr(10), chr(10), 'g')` converts CRLF → LF. Then `regexp_replace(result, chr(13), chr(10), 'g')` catches any leftover bare CRs. Far more readable than embedding literal escape sequences in your SQL.

example

SELECT string_agg(chr(ascii('A') + i), '') FROM generate_series(0,25) i;

→ABCDEFGHIJKLMNOPQRSTUVWXYZ

concat

PG 8.0+→ text

Concatenates all arguments into a single string, silently ignoring NULL values.

Developer

Signature

concat ( val1 "any" [, val2 "any" [, ...]] ) → text

Parameters

Parameter	Type	Description
val1, val2, ...	any	Values to concatenate; NULLs are skipped

Examples

sql

SELECT concat('Hello', ', ', 'World');

→Hello, World

NULL is silently skipped — unlike ||

sql

SELECT concat('foo', NULL, 'bar');

→foobar

sql

SELECT concat(first_name, ' ', last_name) FROM users LIMIT 1;

→Jane Doe

Non-text types are cast automatically

sql

SELECT concat(42, ' items at 
    
    
    
  

, 3.99);

→42 items at $3.99

All NULLs returns empty string, not NULL

sql

SELECT concat(NULL, NULL);

→

⚠Anti-Pattern— Using || with nullable columns expecting the same NULL-ignoring behavior as concat()

A very common bug: `first_name || ' ' || last_name` returns NULL whenever either column is NULL. Developers who expect this to 'just work' like concat() get NULL full names for users with no last name on record.

✓ Instead: Use `concat(first_name, ' ', last_name)` or `concat_ws(' ', first_name, last_name)` when any input may be NULL.

The `||` operator propagates NULL: `'foo' || NULL || 'bar'` returns NULL. `concat()` skips NULLs: `concat('foo', NULL, 'bar')` returns 'foobar'. This distinction is critical when building full names, addresses, or dynamic labels from nullable columns. Use `concat` for user-facing string building; use `||` when NULL-propagation is intentional (e.g. to detect missing required values).

example

SELECT concat(first_name, ' ', last_name) AS full_name FROM employees;

→'Alice Smith', 'Bob Jones' — NULLs silently skipped

Common Gotchas

⚠

NULL kills comparisons silently — nothing equals NULL, not even NULL

NULL = NULL is not TRUE. It is NULL. Queries filtering for NULL with = will silently return zero rows.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

concat_ws

PG 9.1+→ text

Concatenates all non-NULL arguments separated by the given separator string.

DeveloperData Eng

Signature

concat_ws ( sep text, val1 "any" [, val2 "any" [, ...]] ) → text

Parameters

Parameter	Type	Description
sep	text	Separator string inserted between non-NULL values
val1, val2, ...	any	Values to join; NULLs are skipped entirely

Examples

sql

SELECT concat_ws(', ', 'one', 'two', 'three');

→one, two, three

No double space when middle name is NULL

sql

SELECT concat_ws(' ', first_name, middle_name, last_name) FROM people;

→John Smith (when middle_name is NULL)

sql

SELECT concat_ws('-', '2024', '03', '15');

→2024-03-15

NULLs suppressed, no leading/trailing separator

sql

SELECT concat_ws(', ', NULL, 'a', NULL, 'b');

→a, b

Empty strings ARE included unlike NULLs

sql

SELECT concat_ws('/', 'https:', '', 'example.com', 'path');

→https:///example.com/path

⚠Anti-Pattern— Not realizing empty strings are included but NULLs are not

`concat_ws(', ', 'a', '', 'c')` returns `'a, , c'` — the empty string produces a dangling separator, which surprises developers who expect empty strings to be skipped like NULLs.

✓ Instead: Use `NULLIF(col, '')` to coerce empty strings to NULL before passing to concat_ws: `concat_ws(', ', NULLIF(line2,''), NULLIF(city,''))`.

`concat_ws(', ', line1, line2, city, state, zip)` automatically skips any NULL component and avoids double commas. This replaces 5-10 lines of CASE/COALESCE logic when composing optional address fields. Note: empty strings ('') are NOT skipped — only NULLs are. Use NULLIF(col, '') to coerce empty strings to NULL first if needed.

example

SELECT concat_ws(', ', city, region, country) AS address FROM locations;

→'London, England, UK' — NULL fields skipped automatically

Common Gotchas

⚠

NULL kills comparisons silently — nothing equals NULL, not even NULL

NULL = NULL is not TRUE. It is NULL. Queries filtering for NULL with = will silently return zero rows.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

convert_from

PG 8.0+→ text

Converts a byte string from the specified source encoding to the database's encoding (typically UTF-8).

DBAData Eng

Signature

convert_from ( string bytea, src_encoding name ) → text

Parameters

Parameter	Type	Description
string	bytea	Raw bytes in the source encoding
src_encoding	name	Source encoding name (e.g. 'LATIN1', 'WIN1252', 'UTF8')

Examples

sql

SELECT convert_from('\xc3a9'::bytea, 'UTF8');

→é

sql

SELECT convert_from(col_bytes, 'WIN1252') FROM imports WHERE encoding = 'windows';

→Decoded text string

ASCII is the same in all encodings

sql

SELECT convert_from('hello'::bytea, 'UTF8');

→hello

sql

SELECT octet_length(convert_from(raw_col, 'LATIN1')) FROM data;

→Byte counts after conversion

sql

SELECT convert_from(decode(base64_col, 'base64'), 'UTF8') FROM uploads;

→Decoded and converted text

⚠Anti-Pattern— Casting legacy bytea columns directly to text without specifying encoding

Writing `col::text` on a bytea column that holds Latin-1 or Windows-1252 bytes silently misinterprets the data — PostgreSQL assumes UTF-8. Characters above U+007F will either corrupt or raise a conversion error.

✓ Instead: Always use `convert_from(col, 'WIN1252')` (or the correct source encoding) so PostgreSQL performs a proper encoding conversion to the database's UTF-8 encoding.

When importing legacy data stored as Windows-1252 or Latin-1, convert at query time: `SELECT convert_from(col::bytea, 'WIN1252')`. If the source is already stored as text but was mis-declared as UTF-8, use `SELECT convert_from(col::bytea, 'WIN1252')` — but first check `SHOW server_encoding` to confirm your DB is UTF8. For bulk fixes, combine with an UPDATE.

example

SELECT convert_from(data, 'LATIN1') FROM raw_imports WHERE encoding = 'latin1';

→Decoded UTF-8 text from Latin-1 binary data

decode

PG 8.0+→ bytea

Decodes binary data from a text representation using the specified format (base64, hex, or escape).

DeveloperDBASecurity

Signature

decode ( string text, format text ) → bytea

Parameters

Parameter	Type	Description
string	text	Text representation of binary data
format	text	'base64', 'hex', or 'escape'

Examples

Decodes 'Hello' from base64

sql

SELECT decode('SGVsbG8=', 'base64');

→\x48656c6c6f

sql

SELECT decode('48656c6c6f', 'hex');

→\x48656c6c6f

Decode base64 directly to text

sql

SELECT convert_from(decode('SGVsbG8=', 'base64'), 'UTF8');

→Hello

sql

SELECT length(decode(thumbnail_b64, 'base64')) AS bytes FROM images;

→Byte size of decoded images

Round-trip encode/decode

sql

SELECT decode(encode('secret'::bytea, 'hex'), 'hex') = 'secret'::bytea;

→true

⚠Anti-Pattern— Storing base64-encoded binary in text columns and decoding at query time for every read

Storing images or file content as base64 text and calling decode() on every read adds CPU overhead for decoding and bloats storage by ~33% compared to native bytea. It also makes byte-level operations (like checking file magic bytes) require an extra decode step.

✓ Instead: Store binary data in a `bytea` column directly. Use `decode()` only as a one-time migration step to convert existing base64 text into bytea.

`SELECT encode(col, 'hex') FROM table` gives you a readable hex dump of any bytea column — useful for debugging binary protocols, checking magic bytes, or validating file headers. For images: `left(encode(img_data, 'hex'), 8)` — the first 4 bytes should be `89504e47` for PNG or `ffd8ffe0` for JPEG.

example

SELECT decode(encoded_img, 'base64') AS img_bytes FROM attachments WHERE type = 'image';

→Raw bytea image data decoded from base64

encode

PG 8.0+→ text

Encodes binary data into a text representation using the specified format (base64, hex, or escape).

DeveloperDBASecurity

Signature

encode ( data bytea, format text ) → text

Parameters

Parameter	Type	Description
data	bytea	Binary data to encode
format	text	'base64', 'hex', or 'escape'

Examples

Standard base64 encoding

sql

SELECT encode('Hello'::bytea, 'base64');

→SGVsbG8=

sql

SELECT encode('Hello'::bytea, 'hex');

→48656c6c6f

Generate a random hex token

sql

SELECT encode(gen_random_bytes(16), 'hex');

→a3f8...random 32-char hex

pgcrypto: hex-encoded hash

sql

SELECT encode(digest('secret', 'sha256'), 'hex');

→2bb80d537...sha256 hex

sql

SELECT encode(logo_bytes, 'base64') FROM products WHERE id = 1;

→Base64 image data for embedding in JSON/HTML

⚠Anti-Pattern— Using encode(col, 'base64') output directly in URLs without stripping newlines

PostgreSQL's base64 output inserts newline characters every 76 characters per the MIME standard. Embedding this directly in a URL or JSON field produces broken tokens.

✓ Instead: Strip newlines after encoding: `replace(encode(data, 'base64'), chr(10), '')`. For URL-safe base64, also translate: `translate(replace(encode(data,'base64'), chr(10),''), '+/=', '-_')`.

`SELECT encode(gen_random_bytes(18), 'base64')` generates a 24-character base64 token. For URL-safe tokens, replace the `+` and `/` characters: `translate(encode(gen_random_bytes(18), 'base64'), '+/=', '-_')`. This is a perfectly adequate session token or API key generator that avoids application-layer randomness — useful for bulk-generating invite codes with a single INSERT...SELECT.

example

SELECT encode(logo, 'base64') AS b64 FROM brands WHERE id = 1;

→'iVBORw0KGgo...'

format

PG 9.1+→ text

Formats a string using printf-style format specifiers with safe SQL identifier and literal quoting.

DeveloperDBAData Eng

Signature

format ( formatstr text [, formatarg "any" [, ...]] ) → text

Parameters

Parameter	Type	Description
formatstr	text	Format string with %s (value), %I (quoted identifier), %L (quoted literal), %% (literal %)
arg1, ...	any	Values substituted for format specifiers in order, or by position (%1$s)

Examples

sql

SELECT format('Hello, %s!', 'world');

→Hello, world!

%I quotes identifiers, %L quotes literals safely

sql

SELECT format('SELECT * FROM %I WHERE id = %L', 'my_table', 42);

→SELECT * FROM my_table WHERE id = '42'

Positional arguments — reuse arg 1 twice

sql

SELECT format('%1$s=%1$s', 'foo');

→foo=foo

Width and alignment (like printf)

sql

SELECT format('%-10s|%10s', 'left', 'right');

→left | right

Use inside plpgsql for safe dynamic queries

sql

EXECUTE format('UPDATE %I SET %I = %L WHERE id = %s', tbl, col, val, id);

→Safe dynamic SQL in PL/pgSQL

⚠Anti-Pattern— Building dynamic SQL with string concatenation instead of format()

Writing `'SELECT * FROM ' || table_name || ' WHERE id = ' || user_id` in a PL/pgSQL function is a SQL injection vector when `table_name` or `user_id` come from user input or external data. It also becomes unreadable with many substitutions.

✓ Instead: Use `format('SELECT * FROM %I WHERE id = %L', table_name, user_id)`. The `%I` specifier safely quotes identifiers; `%L` safely quotes literals.

Never concatenate user-supplied table or column names into dynamic SQL with `||`. Instead: `EXECUTE format('SELECT %I FROM %I WHERE %I = %L', col_name, table_name, filter_col, filter_val)`. `%I` double-quotes identifiers and escapes embedded double-quotes. `%L` single-quotes literals and escapes embedded single-quotes. Together they make injection impossible — even with adversarial input like `'; DROP TABLE users; --`.

example

SELECT format('INSERT INTO %I VALUES (%L)', table_name, value) FROM staging;

→Safely quoted DDL statements with identifiers and literals

initcap

PG 8.0+→ text

Converts the first letter of each word to upper case and the rest to lower case; words are delimited by non-alphanumeric characters.

Developer

Signature

initcap ( string text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to title-case

Examples

sql

SELECT initcap('hello world');

→Hello World

Lowercases the non-initial letters too

sql

SELECT initcap('HELLO WORLD');

→Hello World

Capitalizes after apostrophe — a known quirk

sql

SELECT initcap('it''s a test');

→It'S A Test

Correct result for this case

sql

SELECT initcap('o''reilly');

→O'Reilly

Hyphen acts as word separator

sql

SELECT initcap('user-defined type');

→User-Defined Type

⚠Anti-Pattern— Using initcap on names with Irish/Scottish prefixes expecting correct output

`initcap('mcdonald')` returns `Mcdonald` (not `McDonald`), and `initcap('macleod')` returns `Macleod` (not `MacLeod`). Using initcap as a general name-normalization function in a customer database will silently produce incorrect titles for a meaningful fraction of names.

✓ Instead: Use initcap only for generic title-casing (product names, city names). For personal names, implement a custom normalization function or handle it in application code where you can manage special-case rules.

`initcap('mcdonald')` returns `Mcdonald`, not `McDonald`. `initcap('o\'reilly')` returns `O'Reilly` (correct), but `initcap('it\'s fine')` returns `It'S Fine` (wrong — S is capitalized after the apostrophe). For handling edge cases like Mc/Mac prefixes or names with apostrophes, you need a custom PL/pgSQL function. For most first/last name standardization, initcap is good enough — just document the exception.

example

SELECT initcap(lower(full_name)) FROM imported_contacts WHERE full_name = upper(full_name);

→'ALICE SMITH' → 'Alice Smith'

left

PG 8.0+→ text

Returns the first n characters of the string; if n is negative, returns all but the last |n| characters.

DeveloperData Eng

Signature

left ( string text, n integer ) → text

Parameters

Parameter	Type	Description
string	text	Input string
n	integer	Number of characters to return from the left; negative means remove from the right

Examples

sql

SELECT left('Hello, World!', 5);

→Hello

Negative: removes last 2 characters

sql

SELECT left('abcdef', -2);

→abcd

sql

SELECT left(email, strpos(email, '@') - 1) FROM users;

→username part of email

sql

SELECT left(description, 100) || '...' FROM articles WHERE length(description) > 100;

→Truncated description with ellipsis

n larger than string length — returns the whole string

sql

SELECT left('abc', 100);

→abc

⚠Anti-Pattern— Using left() in a WHERE clause to test for a prefix instead of LIKE or starts_with()

`WHERE left(col, 5) = 'ACME-'` wraps the column in a function call, preventing B-tree index use on `col`. The query does a full sequential scan.

✓ Instead: Use `WHERE col LIKE 'ACME-%'` or `WHERE starts_with(col, 'ACME-')` — both can use a standard B-tree index on the column.

To replace a known-length prefix: `right(col, -6)` strips the first 6 chars. To strip a known-length suffix: `left(col, -4)` removes the last 4. Combine them to extract the middle of a fixed-format string without computing positions: `left(right('PREFIX_middle_SUFFIX', -7), -7)`. Faster and more readable than `substring(col, 8, length(col)-14)` for fixed-format codes.

example

SELECT id, left(description, 80) || '...' AS preview FROM articles WHERE length(description) > 80;

→First 80 chars of long descriptions with ellipsis

length

PG 8.0+→ integer

Returns the number of characters in the string, or the number of bytes in a bytea value.

DeveloperDBA

Signatures

length ( string text ) → integer

length ( string bytea ) → integer

Parameters

Parameter	Type	Description
string	text \| bytea	Input string or binary data

Examples

sql

SELECT length('hello');

→5

sql

SELECT length(NULL);

→NULL

sql

SELECT length('');

→0

Character count, not byte count

sql

SELECT length('日本語');

→3

Data validation query

sql

SELECT id FROM products WHERE length(sku) != 8;

→Rows where SKU is not exactly 8 characters

⚠Anti-Pattern— Using length() > 0 to filter out empty and NULL values

`WHERE length(col) > 0` silently excludes NULL rows without error — they just disappear from results. Developers often write this expecting it to mean 'has any value' but NULLs are not matched.

✓ Instead: Write `WHERE col IS NOT NULL AND col <> ''` to explicitly handle both empty strings and NULLs, making the intent clear.

`WHERE length(col) > 0` will NOT match NULL rows — they are silently excluded. If you want 'non-empty, non-null', write `WHERE col IS NOT NULL AND col <> ''`, or `WHERE length(col) > 0 OR col IS NULL` to include NULLs. For a zero-safe version: `COALESCE(length(col), 0) > 0`. This is the most common NULL-related bug in string validation queries.

example

SELECT AVG(length(content)), MAX(length(content)) FROM posts GROUP BY category;

→Average and max content length per category

Common Gotchas

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

lower

PG 8.0+→ text

Converts all characters in the string to their lower-case equivalents using the current locale.

DeveloperDBA

Signature

lower ( string text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to convert to lower case

Examples

sql

SELECT lower('HELLO WORLD');

→hello world

sql

SELECT lower('PostgreSQL 16');

→postgresql 16

sql

SELECT lower(NULL);

→NULL

sql

SELECT * FROM users WHERE lower(email) = lower($1);

→Case-insensitive email lookup

Unicode-aware with proper locale

sql

SELECT lower('ÜNÏCÖDÉ');

→ünïcödé

⚠Anti-Pattern— Calling lower() in a WHERE clause on a column without a functional index

`WHERE lower(email) = 'user@example.com'` performs a full table scan on large tables — PostgreSQL cannot use a plain B-tree index on `email` when the column is wrapped in lower().

✓ Instead: Create a functional index: `CREATE INDEX ON users (lower(email))` and keep the WHERE clause as `WHERE lower(email) = $1`. Alternatively, use the `citext` extension for transparent case-insensitive comparisons without function calls.

`CREATE INDEX idx_users_email_lower ON users (lower(email));` then query with `WHERE lower(email) = lower($1)`. This uses the index efficiently — no full table scan. Much faster than `ILIKE` for equality checks on large tables, and avoids storing a separate normalized column. Works with unique constraints too: `CREATE UNIQUE INDEX ON users (lower(email))` prevents duplicate emails regardless of case.

example

SELECT * FROM users WHERE lower(email) = lower($1);

→Case-insensitive email lookup

Common Gotchas

⚠

COALESCE evaluates ALL arguments — not short-circuit like IF

COALESCE evaluates every argument before returning the first non-NULL. This means expensive functions or side-effectful calls on later arguments ARE executed.

lpad

PG 8.0+→ text

Left-pads the string to the specified length by prepending copies of the fill string (default: space).

DeveloperData Eng

Signatures

lpad ( string text, length integer ) → text

lpad ( string text, length integer, fill text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to pad
length	integer	Target total length; if string is longer, it is truncated
fill	text	Padding string (default: single space)

Examples

sql

SELECT lpad('42', 8, '0');

→00000042

Default fill is space

sql

SELECT lpad('hello', 10);

→ hello

Truncates if string exceeds length

sql

SELECT lpad('toolongstring', 5);

→toolon

sql

SELECT lpad(invoice_number::text, 10, '0') FROM invoices;

→0000000042 — zero-padded display IDs

Empty string padded to fill width

sql

SELECT lpad('', 3, '*');

→***

⚠Anti-Pattern— Using lpad to zero-pad then sorting the result lexicographically without consistent width

If different rows are padded to different widths (e.g. some to 6 digits, others to 8), lexicographic ORDER BY still produces wrong results. lpad only fixes sort order when ALL values are padded to the same target length.

✓ Instead: Always use a fixed, consistent length for lpad: `lpad(id::text, 10, '0')` for all rows. If you store the padded value, add a CHECK constraint ensuring the length is always exactly N.

If IDs or codes will ever be sorted lexicographically (file exports, S3 keys, report codes), always zero-pad: `lpad(id::text, 10, '0')`. Without padding, '2' sorts after '10' in string order. Zero-padding makes numeric and lexicographic sort identical. This also matters for binary search in sorted files — a 10-digit zero-padded integer is always exactly 10 bytes, making fixed-width record parsing trivial.

example

SELECT lpad(order_id::text, 8, '0') AS formatted_id FROM orders;

→'1' → '00000001', '12345' → '00012345'

ltrim

PG 8.0+→ text

Removes the longest string of characters in the characters set from the beginning (left side) of the string.

DeveloperDBAData Eng

Signatures

ltrim ( string text ) → text

ltrim ( string text, characters text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to left-trim
characters	text	Character set to remove (default: space)

Examples

sql

SELECT ltrim('   hello');

→hello

Strip leading zeros from string numbers

sql

SELECT ltrim('000042', '0');

→42

sql

SELECT ltrim('xxxhello', 'x');

→hello

Strip leading slash

sql

SELECT ltrim('/path/to/file', '/');

→path/to/file

sql

SELECT ltrim('--comment', '-');

→comment

⚠Anti-Pattern— Trimming leading zeros with ltrim then casting without guarding the all-zero case

`ltrim('000', '0')::integer` throws an error because ltrim returns an empty string, and casting '' to integer fails with `invalid input syntax for type integer`.

✓ Instead: Use `NULLIF(ltrim(col, '0'), '')::integer` — the NULLIF converts empty string to NULL, and NULL::integer is NULL rather than an error.

When string numbers arrive zero-padded and you need the integer value, `ltrim(col, '0')::integer` is faster and cleaner than regex. Watch out: `ltrim('0', '0')` returns '' (empty string), so cast safely with `NULLIF(ltrim(col, '0'), '')::integer` to avoid a cast error on all-zero strings.

example

SELECT ltrim(product_code, 'SKU-') FROM products WHERE product_code LIKE 'SKU-%';

→'SKU-ABC123' → 'ABC123'

md5

PG 8.0+→ text

Computes the MD5 hash of the input and returns it as a 32-character lowercase hexadecimal string.

DeveloperDBA

Signatures

md5 ( string text ) → text

md5 ( data bytea ) → text

Parameters

Parameter	Type	Description
string	text \| bytea	Input value to hash

Examples

sql

SELECT md5('hello');

→5d41402abc4b2a76b9719d911017c592

MD5 of empty string

sql

SELECT md5('');

→d41d8cd98f00b204e9800998ecf8427e

sql

SELECT md5(row_to_json(t)::text) FROM users t WHERE id = 1;

→Fingerprint of entire row as hex string

Content-change detection

sql

SELECT id FROM cache_entries WHERE md5(content) != stored_hash;

→Rows where content has changed since last hash

sql

SELECT '\x' || md5(random()::text) AS random_hex;

→Random 32-char hex string

⚠Anti-Pattern— Using md5() to hash passwords before storing them

MD5 is cryptographically broken. Using `md5(password)` or even `md5(salt || password)` for password storage is dangerously insecure — rainbow tables and GPU cracking render it useless. This is a well-known but still common mistake.

✓ Instead: Use the `pgcrypto` extension: `crypt(password, gen_salt('bf', 12))` for bcrypt hashing, and `crypt(input, stored_hash)` for verification. Never store MD5-hashed passwords.

Never use `md5` for password hashing (broken, unsalted). Use it for: deduplication keys (`SELECT DISTINCT ON (md5(content))`), cache invalidation tokens, row-level change detection (`WHERE md5(row_to_json(t)::text) != last_hash`), and generating deterministic fake IDs for test data. For cryptographic hashing, use pgcrypto's `digest(col, 'sha256')`.

example

SELECT md5(lower(trim(email))) AS email_hash FROM users;

→Normalised, consistent hash for deduplication

normalize

PG 13+→ text

Normalizes a string to the specified Unicode normalization form (NFC, NFD, NFKC, or NFKD); defaults to NFC.

DeveloperDBA

Signature

normalize ( string text [, form ] ) → text

Parameters

Parameter	Type	Description
string	text	Input string to normalize
form	keyword	NFC (default), NFD, NFKC, or NFKD normalization form

Examples

NFC: precomposed and decomposed 'é' compare equal

sql

SELECT normalize(U&'\00E9') = normalize(U&'e\0301');

→true

sql

SELECT normalize('café', NFC);

→café (precomposed form)

Ligature decomposed to two characters

sql

SELECT normalize('ﬁ', NFKC);

→fi

Superscript 2 maps to regular digit 2

sql

SELECT normalize('²', NFKC);

→2

NFC combines, NFD keeps separate

sql

SELECT length(normalize('e\u0301', NFC)), length(normalize('e\u0301', NFD));

→1 | 2

⚠Anti-Pattern— Comparing Unicode strings for equality without normalizing both sides first

Two strings that look visually identical — such as 'café' composed two different ways — will not compare equal in PostgreSQL if they use different Unicode normalization forms. `'café' = 'café'` can return false when one uses U+00E9 and the other uses e + U+0301.

✓ Instead: Normalize at write time so all stored values use the same form: `CHECK (col = normalize(col, NFC))`. Then comparisons are reliable without normalizing at query time.

'é' can be stored as a single precomposed codepoint (U+00E9) OR as 'e' + combining accent (U+0065 U+0301). These look identical but are byte-different — they will violate a UNIQUE constraint silently if one form is inserted after the other. Normalize at write time: `CHECK (username = normalize(username, NFC))` or in a trigger. NFKC also collapses ligatures (ﬁ→fi) and compatibility forms, which is usually what you want for user-facing identifiers.

example

SELECT normalize(input, NFC) = normalize(stored, NFC) AS match FROM names WHERE normalize(input, NFC) = normalize(stored, NFC);

→Normalize before comparing to handle composed vs decomposed forms

octet_length

PG 8.0+→ integer

Returns the number of bytes in the string (UTF-8 multibyte characters count as multiple bytes).

DeveloperDBAData Eng

Signatures

octet_length ( string text ) → integer

octet_length ( string bytea ) → integer

Parameters

Parameter	Type	Description
string	text \| bytea	Input string or binary data

Examples

ASCII: byte count = char count

sql

SELECT octet_length('hello');

→5

'é' is 2 bytes in UTF-8, so 4 chars = 5 bytes

sql

SELECT octet_length('café');

→5

3 chars × 3 bytes each in UTF-8

sql

SELECT octet_length('日本語');

→9

sql

SELECT id FROM messages WHERE octet_length(body) > 65535;

→Rows exceeding 64KB payload limit

sql

SELECT octet_length('\x0102ff'::bytea);

→3

⚠Anti-Pattern— Enforcing VARCHAR(n) byte limits with char_length instead of octet_length

PostgreSQL's `VARCHAR(n)` limit is in characters, not bytes. But external systems (Kafka, S3 metadata, SNS) often have byte limits. A 255-character field containing 255 emoji is 1,020 bytes — far over a 1KB limit. Checking `char_length(col) <= 255` does not catch this.

✓ Instead: Add a separate CHECK constraint: `CHECK (octet_length(col) <= 1024)` when the destination system has a byte budget.

Protocol limits (HTTP headers, Kafka messages, email subjects) are in bytes, not characters. A 255-character VARCHAR that looks fine can exceed limits when it contains emoji or CJK characters. Add a CHECK constraint: `CHECK (octet_length(col) <= 1024)` to enforce byte limits. Pair with a UI warning: show remaining chars but calculate against `maxBytes / avgBytesPerChar` heuristically.

example

SELECT col, octet_length(col) AS bytes, char_length(col) AS chars FROM t WHERE octet_length(col) != char_length(col);

→Rows with multi-byte characters

overlay

PG 8.0+→ text

Replaces a substring starting at position start for count characters with the replacement string.

DeveloperDBASecurity

Signature

overlay ( string text PLACING replacement text FROM start integer [ FOR count integer ] ) → text

Parameters

Parameter	Type	Description
string	text	Original string
replacement	text	String to insert
start	integer	1-based character position where replacement begins
count	integer	Number of characters to replace (defaults to length of replacement)

Examples

sql

SELECT overlay('Txxxxas' PLACING 'hom' FROM 2 FOR 4);

→Thomas

sql

SELECT overlay('Hello World' PLACING '-----' FROM 7 FOR 5);

→Hello -----

Mask first 14 chars of credit card

sql

SELECT overlay(card_num PLACING '****-****-****' FROM 1 FOR 14) FROM cards;

→****-****-****-1234

sql

SELECT overlay(phone PLACING 'XXX' FROM 5 FOR 3) FROM contacts;

→555-XXX-1234

Without FOR: replaces length-of-replacement chars

sql

SELECT overlay('abcdefgh' PLACING 'XYZ' FROM 3);

→abXYZfgh

⚠Anti-Pattern— Using overlay without FOR when the replacement length differs from what you want to remove

When FOR is omitted, overlay replaces exactly `length(replacement)` characters. If your replacement is shorter or longer than the section you want to overwrite, you get unexpected results — characters are shifted rather than replaced in-place.

✓ Instead: Always specify FOR explicitly: `overlay(col PLACING new_val FROM start FOR length_to_remove)`. Count the characters you want to overwrite, not the length of the replacement.

Use `overlay` to partially mask sensitive data for display or logging: `overlay(email PLACING repeat('*', strpos(email,'@')-2) FROM 2 FOR strpos(email,'@')-2)` turns 'user@example.com' into 'u***@example.com'. Unlike `regexp_replace`, `overlay` preserves the original string boundaries and positions, making partial redaction exact and predictable.

example

SELECT overlay(card_number placing '****' from 5 for 8) AS masked FROM payment_cards;

→'4111111111111111' → '4111****1111'

parse_ident

PG 10+→ text[]

Parses a qualified SQL identifier (e.g. schema.table or "My Schema"."My Table") into an array of identifier parts, handling quoting correctly.

DBADeveloper

Signature

parse_ident ( str text [, strict_mode boolean DEFAULT true ] ) → text[]

Parameters

Parameter	Type	Description
str	text	Qualified identifier string to parse
strict_mode	boolean	If true (default), extra trailing characters cause an error; if false, they are silently ignored

Examples

sql

SELECT parse_ident('public.users');

→{public,users}

Double quotes stripped, case preserved

sql

SELECT parse_ident('"My Schema"."My Table"');

→{My Schema,My Table}

sql

SELECT parse_ident('schema.table')[1] AS schema_name;

→schema

Unquoted identifiers are lowercased

sql

SELECT parse_ident('SomeTable', false);

→{sometable}

sql

SELECT parse_ident('pg_catalog.pg_class');

→{pg_catalog,pg_class}

⚠Anti-Pattern— Using split_part('.') to extract schema and table name from a qualified identifier

`split_part('"schema.with.dots".table', '.', 1)` returns `'"schema'` — it splits on every dot, including dots inside quoted identifier names. Any schema or table name containing a dot will be split incorrectly.

✓ Instead: Use `parse_ident(qualified_name)` which correctly handles quoting: `parse_ident('"schema.with.dots".table')[1]` returns `'schema.with.dots'` as intended.

`split_part('"schema.with.dots".table', '.', 1)` returns `'"schema'` — broken. `parse_ident('"schema.with.dots".table')[1]` correctly returns `'schema.with.dots'`. Whenever processing fully-qualified identifiers from pg_catalog views (like `table_schema || '.' || table_name`), use `parse_ident` to handle edge cases with quoted names containing special characters.

example

SELECT parse_ident('public.orders') AS parts;

→'{public,orders}' — schema and table name separated

pg_client_encoding

PG 8.0+→ name

Returns the name of the current client encoding (character set). Useful for diagnosing encoding mismatches between client and server.

DeveloperDBA

Signature

pg_client_encoding ( ) → name

Examples

Most common modern encoding

sql

SELECT pg_client_encoding();

→UTF8

Conditional encoding guard

sql

SELECT CASE WHEN pg_client_encoding() = 'UTF8' THEN 'Unicode OK' ELSE 'Non-Unicode: ' || pg_client_encoding() END;

→Unicode OK

Compare client vs server encoding

sql

SELECT pg_client_encoding(), current_setting('server_encoding');

→UTF8 | UTF8

Euro sign is 3 bytes in UTF-8

sql

SELECT octet_length(E'\u20AC'), pg_client_encoding();

→3 | UTF8

Diagnostic for non-UTF8 sessions

sql

SELECT col, octet_length(col), char_length(col) FROM t WHERE pg_client_encoding() != 'UTF8' LIMIT 5;

→(rows)

⚠Anti-Pattern— Assuming UTF-8 without checking in legacy systems

Legacy clients, Windows ODBC drivers, and some JDBC configurations connect with LATIN1 or WIN1252. Assuming UTF-8 silently corrupts multi-byte character data.

✓ Instead: Add pg_client_encoding() to diagnostic queries; set client_encoding = UTF8 at session start if needed: SET client_encoding = UTF8;

When strings look correct but byte-lengths are wrong, compare pg_client_encoding() against the source system encoding. A mismatch causes silent corruption in octet_length() calculations and bytea comparisons.

example

SELECT col, octet_length(col), char_length(col), pg_client_encoding() FROM t WHERE octet_length(col) != char_length(col);

→Rows with multi-byte characters (normal in UTF-8) or encoding mismatch

position

PG 8.0+→ integer

Returns the starting position (1-based) of the first occurrence of substring within string, or 0 if not found.

Developer

Signatures

position ( substring text IN string text ) → integer

strpos ( string text, substring text ) → integer

Parameters

Parameter	Type	Description
substring	text	String to search for
string	text	String to search within

Examples

sql

SELECT position('om' IN 'Thomas');

→3

strpos() is identical — different syntax

sql

SELECT strpos('Thomas', 'om');

→3

Returns 0 (not NULL or -1) when not found

sql

SELECT position('xyz' IN 'hello');

→0

sql

SELECT position('@' IN email) FROM users;

→Position of @ in each email address

sql

SELECT substr(url, position('://' IN url) + 3) FROM links;

→URL without the protocol prefix

⚠Anti-Pattern— Using strpos() > -1 to test for substring presence (copied from other languages)

In Python and JavaScript, `str.indexOf(sub) > -1` or `str.find(sub) != -1` tests for presence. In PostgreSQL, `strpos` returns 0 (not -1) for no match — so `strpos(col, 'foo') > -1` is always true, matching every row including those without the substring.

✓ Instead: Use `strpos(col, 'foo') > 0` or simply `col LIKE '%foo%'` to test for substring presence.

`WHERE strpos(col, 'foo') > 0` correctly finds matches. `WHERE strpos(col, 'foo') != 0` also works. The trap: developers from Python/JS write `> -1` expecting false for 'not found', but the minimum return value is 0 — so `> -1` matches EVERYTHING. Second trap: `position()` is case-sensitive — for case-insensitive search, use `strpos(lower(col), lower(needle)) > 0` or `ILIKE`.

example

SELECT email, position('@' IN email) AS at_pos FROM users WHERE position('@' IN email) = 0;

→Emails missing @ symbol (data quality check)

Common Gotchas

⚠

LIKE is case-sensitive; ILIKE is not — and LIKE is faster

LIKE 'hello%' will not match 'Hello'. Use ILIKE for case-insensitive pattern matching, but expect a performance cost.

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

⚠

strpos() returns 0 (not -1 or NULL) when substring not found

strpos() returns 0 for 'not found', not NULL or -1 like many languages. Writing > -1 will match everything.

quote_ident

PG 8.0+→ text

Quotes the given string as a SQL identifier, adding double quotes if necessary and doubling any embedded double quotes.

DeveloperDBASecurity

Signature

quote_ident ( string text ) → text

Parameters

Parameter	Type	Description
string	text	Identifier (table name, column name, etc.) to safely quote

Examples

Simple identifiers need no quoting

sql

SELECT quote_ident('users');

→users

Spaces require double-quoting

sql

SELECT quote_ident('My Table');

→"My Table"

Reserved words are quoted

sql

SELECT quote_ident('select');

→"select"

Embedded quotes are doubled

sql

SELECT quote_ident('table"name');

→"table""name"

sql

EXECUTE 'SELECT * FROM ' || quote_ident(table_name);

→Safe dynamic SQL using the identifier

⚠Anti-Pattern— Concatenating table or column names directly into dynamic SQL strings

Writing `'SELECT * FROM ' || schema_name || '.' || table_name` in a stored procedure is a SQL injection risk. If either value contains special characters or reserved words, the query will either fail or — with malicious input — execute unintended SQL.

✓ Instead: Use `format('SELECT * FROM %I.%I', schema_name, table_name)` which calls quote_ident internally and handles all edge cases safely.

If a table name comes from a parameter or system catalog, `'SELECT * FROM ' || table_name` is a SQL injection vector. `'SELECT * FROM ' || quote_ident(table_name)` is not. In modern PL/pgSQL, prefer `format('SELECT * FROM %I', table_name)` — it's more readable and handles multiple identifiers cleanly. The `%I` specifier in `format()` calls `quote_ident()` internally.

example

SELECT 'SELECT * FROM ' || quote_ident(table_name) FROM pg_tables WHERE schemaname = 'public';

→'SELECT * FROM "my table"' — safely quoted identifier

quote_literal

PG 8.0+→ text

Quotes a string as a SQL string literal, adding single quotes and doubling any embedded single quotes; returns NULL for NULL input.

DeveloperDBASecurity

Signatures

quote_literal ( string text ) → text

quote_literal ( anyelement ) → text

Parameters

Parameter	Type	Description
string	text \| any	Value to safely quote as a SQL literal

Examples

Embedded single quotes are doubled

sql

SELECT quote_literal('O''Reilly');

→'O''Reilly'

Non-text values are cast to text first

sql

SELECT quote_literal(42);

→'42'

NULL input returns NULL (use quote_nullable for NULL-safe version)

sql

SELECT quote_literal(NULL);

→NULL

sql

SELECT 'WHERE name = ' || quote_literal(user_input);

→WHERE name = 'user input safely quoted'

sql

SELECT quote_literal(current_timestamp);

→'2024-03-15 14:30:00+00'

⚠Anti-Pattern— Using quote_literal when the value might be NULL, breaking dynamic SQL silently

`'SET col = ' || quote_literal(maybe_null)` produces NULL when `maybe_null` is NULL — and `EXECUTE NULL` raises an error or does nothing. This is a silent failure that only appears at runtime when a nullable column is updated.

✓ Instead: Use `quote_nullable(maybe_null)` which returns the unquoted string `NULL` for NULL input, producing valid SQL like `SET col = NULL`.

In dynamic SQL, if a value might be NULL, `quote_literal(NULL)` returns NULL — and concatenating NULL into a string produces NULL, silently breaking your query. Use `quote_nullable(val)` instead: it returns the string 'NULL' for NULL input and a quoted literal otherwise. This is the safer default for all dynamic SQL value quoting in stored procedures.

example

SELECT 'WHERE name = ' || quote_literal(search_term) FROM params;

→WHERE name = 'O''Brien' — properly escaped literal

quote_nullable

PG 8.0+→ text

Like quote_literal but returns the string 'NULL' (unquoted) when the input is NULL, making it safe for dynamic SQL with nullable values.

DeveloperDBASecurity

Signatures

quote_nullable ( string text ) → text

quote_nullable ( anyelement ) → text

Parameters

Parameter	Type	Description
value	text \| any	Value to quote; NULLs produce unquoted NULL

Examples

Returns unquoted NULL string — safe for dynamic SQL

sql

SELECT quote_nullable(NULL);

→NULL

sql

SELECT quote_nullable('hello');

→'hello'

Single quotes doubled

sql

SELECT quote_nullable('it''s');

→'it''s'

sql

SELECT quote_nullable(42);

→'42'

sql

EXECUTE 'UPDATE t SET col = ' || quote_nullable(maybe_null_val);

→Safe even when value is NULL

⚠Anti-Pattern— Using quote_literal as the default for all value quoting in stored procedures

Many developers default to `quote_literal` for all value substitution in dynamic SQL because it's better known. But any nullable column will break the generated SQL when the value is NULL — `quote_literal(NULL)` propagates NULL into the string concatenation.

✓ Instead: Make `quote_nullable` your default for all value quoting in PL/pgSQL. Reserve `quote_literal` for cases where a NULL value is genuinely an error condition.

The difference matters most in UPDATE/INSERT statements built dynamically where optional columns might be NULL. With `quote_literal(NULL)` the SQL string itself becomes NULL and the EXECUTE fails silently or raises an error. With `quote_nullable(NULL)` you get `SET col = NULL` — the correct SQL. Make `quote_nullable` your default; only use `quote_literal` when a NULL should be an error.

example

SELECT 'DEFAULT ' || quote_nullable(col_default) FROM column_specs;

→DEFAULT NULL or DEFAULT 'value' — handles NULL safely

Common Gotchas

⚠

NULL kills comparisons silently — nothing equals NULL, not even NULL

NULL = NULL is not TRUE. It is NULL. Queries filtering for NULL with = will silently return zero rows.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

regexp_count

PG 15+→ integer

Counts the number of non-overlapping matches of the POSIX regular expression pattern in the string.

DeveloperData Eng

Signature

regexp_count ( string text, pattern text [, start integer [, flags text ]] ) → integer

Parameters

Parameter	Type	Description
string	text	Input string to search
pattern	text	POSIX regex pattern
start	integer	Character position to start search (1-based, default 1)
flags	text	Regex flags: 'i' case-insensitive, 'n' newline-sensitive, etc.

Examples

sql

SELECT regexp_count('hello world', 'o');

→2

sql

SELECT regexp_count('aababab', 'ab');

→3

Count uppercase letters

sql

SELECT regexp_count('Hello World', '[A-Z]', 1, 'g');

→2

sql

SELECT regexp_count(body, '\m\w+@\w+\.\w+\M') FROM emails;

→Number of email addresses in body

sql

SELECT id FROM logs WHERE regexp_count(message, 'ERROR', 1, 'i') > 3;

→Log entries with more than 3 error mentions

⚠Anti-Pattern— Using regexp_count on PG versions before 15 without a version guard

`regexp_count` was added in PostgreSQL 15. Calling it on PG 14 or earlier raises `ERROR: function regexp_count(text, text) does not exist`. Applications that need to support multiple PG versions will break at runtime.

✓ Instead: Check your minimum supported PostgreSQL version. For pre-PG15, use the workaround: `array_length(regexp_split_to_array(col, pattern), 1) - 1` or a custom PL/pgSQL helper.

Before PG15: `array_length(regexp_split_to_array(text, 'pattern'), 1) - 1` gives the match count by counting splits. On PG15+, `regexp_count` is cleaner and faster. For counting word occurrences case-insensitively: `regexp_count(lower(col), lower(keyword))` — avoid the 'i' flag for simple literal matching as it disables certain optimizations.

example

SELECT id, regexp_count(content, '\w+') AS word_count FROM documents;

→Word count per document using regex

Common Gotchas

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_instr

PG15+→ integer

Returns the position of the Nth match of a POSIX regular expression within a string. Returns 0 if no match is found. Added in PostgreSQL 15.

DeveloperDBA

Signature

regexp_instr ( string text, pattern text [, start integer [, N integer [, endoption integer [, flags text [, subexpr integer]]]]] ) → integer

Parameters

Parameter	Type	Description
string	text	The source string to search
pattern	text	POSIX regular expression pattern
start	integer	Starting position (1-based); default 1
N	integer	Which match occurrence to find; default 1
endoption	integer	0 = return start of match, 1 = return position after end of match; default 0
flags	text	Options: i (case-insensitive), g etc.
subexpr	integer	Subexpression to report (0 = whole match)

Examples

Case-insensitive: finds BCE at position 3

sql

SELECT regexp_instr('ABCDEF', 'c.e', 1, 1, 0, 'i');

→3

Position of second occurrence

sql

SELECT regexp_instr('hello world hello', 'hello', 1, 2);

→13

Second run of digits (month field)

sql

SELECT regexp_instr('2024-01-15', '[0-9]+', 1, 2);

→6

endoption=1: position AFTER first match

sql

SELECT regexp_instr('2024-01-15', '[0-9]+', 1, 1, 1);

→5

Returns 0 when no match found

sql

SELECT regexp_instr('no match', '[0-9]+');

→0

⚠Anti-Pattern— Using strpos() for pattern-based search

strpos() only does literal substring search. For patterns, Nth occurrence, or case-insensitive position matching, strpos() cannot help.

✓ Instead: Use regexp_instr(col, pattern) instead of strpos(lower(col), lower(literal)) — the latter breaks on multi-byte chars and cannot handle patterns.

Use regexp_instr() to locate a pattern, then feed the position into substring() to extract surrounding context without extra regex passes.

example

SELECT substring(log, regexp_instr(log, 'ERROR:'), 50) FROM app_logs;

→50 chars starting at each ERROR: marker

Common Gotchas

⚠

strpos() returns 0 (not -1 or NULL) when substring not found

strpos() returns 0 for 'not found', not NULL or -1 like many languages. Writing > -1 will match everything.

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_like

PG 15+→ boolean

Returns true if the string matches the POSIX regular expression pattern.

DeveloperData Eng

Signature

regexp_like ( string text, pattern text [, flags text ] ) → boolean

Parameters

Parameter	Type	Description
string	text	Input string to test
pattern	text	POSIX regex pattern
flags	text	Optional flags: 'i' case-insensitive, 'n' newline-sensitive

Examples

sql

SELECT regexp_like('hello world', 'world');

→true

Case-insensitive match

sql

SELECT regexp_like('Hello', 'hello', 'i');

→true

Basic email validation

sql

SELECT regexp_like('test@example.com', '^[^@]+@[^@]+\.[^@]+
    
    
    
  

);

→true

sql

SELECT regexp_like('abc123', '^[a-z]+
    
    
    
  

);

→false

sql

SELECT * FROM users WHERE regexp_like(phone, '^[+]?[0-9\s\-().]{7,20}
    
    
    
  

);

→Valid phone numbers

⚠Anti-Pattern— Writing complex validation logic with regexp_like when a simpler constraint exists

Using `WHERE regexp_like(email, '^[^@]+@[^@]+\.[a-z]{2,}

#x27;)` for email validation in application queries is fine for filtering, but putting this in a CHECK constraint on a user-facing column leads to maintenance problems — regex email validation is notoriously hard to keep correct and RFC-compliant.

✓ Instead: For email columns, use a permissive CHECK like `CHECK (email LIKE '%@%.%')` combined with application-layer validation, or use the `pg_trgm` extension for fuzzy matching. Reserve regexp_like for filtering and data auditing queries.

`regexp_like(col, 'pattern')` is equivalent to `col ~ 'pattern'`. The `~` operator is available in all PostgreSQL versions and is often more concise in WHERE clauses. Use `~*` for case-insensitive matching (equivalent to `regexp_like(col, pattern, 'i')`). Migrate to `regexp_like` in new code for readability; keep `~` in hotpaths since it's the same underlying function.

example

SELECT * FROM products WHERE regexp_like(name, '^[A-Z]{2,4}-[0-9]+
    
    
    
  

, 'i');

→Products matching SKU pattern like AB-123

Common Gotchas

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_match

PG 10+→ text[]

Returns the captured substrings of the first match of a POSIX regex pattern, or NULL if there is no match.

DeveloperData Eng

Signature

regexp_match ( string text, pattern text [, flags text ] ) → text[]

Parameters

Parameter	Type	Description
string	text	Input string to search
pattern	text	POSIX regex with optional capturing groups
flags	text	Optional: 'i' case-insensitive, 'n' newline-sensitive

Examples

Returns array of capture groups

sql

SELECT regexp_match('foobarbequebaz', '(bar)(beque)');

→{bar,beque}

Extract first capture group

sql

SELECT (regexp_match('2024-03-15', '(\d{4})-(\d{2})-(\d{2})'))[1] AS year;

→2024

No match returns NULL, not empty array

sql

SELECT regexp_match('no match here', '\d+');

→NULL

sql

SELECT (regexp_match(email, '@(.+)
    
    
    
  

))[1] AS domain FROM users;

→example.com — domain extracted from email

sql

SELECT regexp_match('v14.2.1', 'v(\d+)\.(\d+)');

→{14,2}

⚠Anti-Pattern— Subscripting regexp_match result without checking for NULL first

`(regexp_match(col, 'pattern'))[1]` silently returns NULL when there is no match — which is fine in a SELECT, but causes silent data loss in an INSERT or UPDATE where you expected a value.

✓ Instead: Use `COALESCE((regexp_match(col, pattern))[1], '')` or add a WHERE filter: `WHERE regexp_match(col, pattern) IS NOT NULL` before relying on the result.

`(regexp_match(col, 'pattern'))[1]` returns NULL both when there is no match AND when there is a match with no capture group. Always wrap: `COALESCE((regexp_match(col, 'pattern'))[1], 'default')` or check `regexp_match(...) IS NOT NULL` before subscripting. Use `regexp_matches` (plural) with the 'g' flag to get all matches as a set of rows.

example

SELECT (regexp_match(log_line, 'ERROR: (.+)'))[1] AS error_msg FROM app_logs WHERE log_line ~ 'ERROR:';

→Extracted error message text from log lines

Common Gotchas

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_replace

PG 7.4+→ text

Replaces substrings matching a POSIX regex pattern with a replacement string; backreferences \1, \2, etc. reference capture groups.

DeveloperDBAData Eng

Signatures

regexp_replace ( string text, pattern text, replacement text ) → text

regexp_replace ( string text, pattern text, replacement text, flags text ) → text

regexp_replace ( string text, pattern text, replacement text, start integer, N integer ) → text

Parameters

Parameter	Type	Description
string	text	Input string
pattern	text	POSIX regex pattern
replacement	text	Replacement string; use \1, \2, \& for backreferences
flags	text	'g' global (all matches), 'i' case-insensitive, 'n' newline-sensitive

Examples

sql

SELECT regexp_replace('Hello World', 'World', 'PostgreSQL');

→Hello PostgreSQL

'g' flag replaces ALL matches

sql

SELECT regexp_replace('foo123bar456', '[0-9]+', 'N', 'g');

→fooNbarN

Strip all non-digits from phone number

sql

SELECT regexp_replace('(800) 555-1234', '[^0-9]', '', 'g');

→8005551234

Backreferences to swap words

sql

SELECT regexp_replace('John Smith', '(\w+)\s(\w+)', '\2, \1');

→Smith, John

sql

SELECT regexp_replace(html, '<[^>]+>', '', 'g');

→Plain text with HTML tags stripped

⚠Anti-Pattern— Forgetting the 'g' flag when intending to replace all occurrences

`regexp_replace(col, '[^a-zA-Z]', '', )` without the 'g' flag removes only the first non-letter character. Rows with multiple non-letter characters appear 'cleaned' but still contain unwanted characters — a bug that is easy to miss in testing on simple data.

✓ Instead: Always add the 'g' flag when you want all matches replaced: `regexp_replace(col, '[^a-zA-Z]', '', 'g')`.

`regexp_replace('aaa', 'a', 'b')` returns 'baa', not 'bbb'. Add 'g' for all-matches replacement. Common use: normalize phone numbers `regexp_replace(phone, '[^0-9]', '', 'g')`, strip HTML `regexp_replace(html, '<[^>]+>', '', 'g')`, redact emails `regexp_replace(text, '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', '[REDACTED]', 'g')`. The 'g' flag is the most commonly forgotten argument in production bugs.

example

SELECT regexp_replace(phone, '[^0-9]', '', 'g') AS digits_only FROM contacts;

→'+1 (555) 123-4567' → '15551234567'

Common Gotchas

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_split_to_array

PG 8.3+→ text[]

Splits the string using a POSIX regex as the delimiter and returns the parts as an array.

DeveloperData Eng

Signature

regexp_split_to_array ( string text, pattern text [, flags text ] ) → text[]

Parameters

Parameter	Type	Description
string	text	String to split
pattern	text	POSIX regex delimiter pattern
flags	text	Optional regex flags: 'i' case-insensitive

Examples

sql

SELECT regexp_split_to_array('hello world', '\s+');

→{hello,world}

Empty strings preserved between consecutive delimiters

sql

SELECT regexp_split_to_array('one,two,,four', ',');

→{one,two,,four}

Wait — trailing empty string included

sql

SELECT regexp_split_to_array('1a2b3c', '[a-z]');

→{1,2,3,c}

sql

SELECT array_length(regexp_split_to_array(body, '\s+'), 1) AS word_count FROM articles;

→Word count per article

Leading/trailing delimiter creates empty string elements

sql

SELECT regexp_split_to_array('  spaced  out  ', '\s+');

→{"",spaced,out,""}

⚠Anti-Pattern— Using regexp_split_to_array for simple fixed-character splits

Using `regexp_split_to_array(col, ',')` when the delimiter is a plain literal character adds regex overhead without benefit. Special regex characters in the delimiter also need escaping, which is easy to forget.

✓ Instead: Use `string_to_array(col, ',')` for literal delimiters — it is faster, cleaner, and the delimiter is treated as a plain string, not a pattern.

Splitting `' hello world '` on `'\s+'` produces `{'', 'hello', 'world', ''}` — note the empty strings at start and end. Use `regexp_split_to_array(trim(col), '\s+')` to avoid them. For simple fixed-delimiter splits, `string_to_array` is faster. Use `regexp_split_to_array` only when you need regex power (e.g. split on any whitespace, split on multiple delimiter types).

example

SELECT regexp_split_to_array(tags, '\s*,\s*') FROM posts;

→'{"tag1","tag2","tag3"}' — splits on comma with optional spaces

Common Gotchas

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_split_to_table

PG 8.3+→ setof text

Splits the string using a POSIX regex as the delimiter and returns each part as a separate row.

DeveloperData Eng

Signature

regexp_split_to_table ( string text, pattern text [, flags text ] ) → setof text

Parameters

Parameter	Type	Description
string	text	String to split
pattern	text	POSIX regex delimiter pattern
flags	text	Optional regex flags

Examples

sql

SELECT regexp_split_to_table('hello world foo', '\s+');

→

hello
world
foo (one row each)

sql

SELECT * FROM regexp_split_to_table('a,b,,d', ',') AS t(word);

→

a
b
(empty)
d

sql

SELECT word, count(*) FROM regexp_split_to_table(lower(body), '\W+') AS word GROUP BY word ORDER BY count DESC LIMIT 10;

→Top 10 word frequencies in body text

sql

SELECT unnest(regexp_split_to_array(tags_text, ',\s*')) AS tag FROM posts;

→Individual tags as rows

sql

SELECT trim(t) AS item FROM regexp_split_to_table('  a , b , c ', ',') AS t WHERE trim(t) != '';

→

a
b
c — CSV parsed and cleaned

⚠Anti-Pattern— Using regexp_split_to_table in a correlated subquery for every row in a large table

Running `regexp_split_to_table(col, pattern)` per row in a correlated subquery or via a LATERAL join on millions of rows can become a significant bottleneck since the regex is compiled and applied row by row.

✓ Instead: Process text splitting in batch: unnest into a staging table first, or use `string_to_array` with `unnest` when the delimiter is a plain literal. For heavy text analysis, consider a dedicated full-text search pipeline.

```sql SELECT word, count(*) AS freq FROM regexp_split_to_table(lower(document_text), '[^a-z]+') AS word WHERE length(word) > 2 GROUP BY word ORDER BY freq DESC LIMIT 20; ``` This gives you top-20 non-trivial words in a document entirely in SQL — useful for generating keyword tags, detecting spam patterns, or building a poor man's text index.

example

SELECT trim(val) FROM regexp_split_to_table('one,  two , three', '\s*,\s*') val;

→one, two, three — one per row

Common Gotchas

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

regexp_substr

PG 15+→ text

Returns the substring matching the Nth occurrence of a POSIX regex pattern, optionally returning a specific capture group.

DeveloperData Eng

Signature

regexp_substr ( string text, pattern text [, start integer [, N integer [, flags text [, subexpr integer ] ] ] ] ) → text

Parameters

Parameter	Type	Description
string	text	Input string
pattern	text	POSIX regex pattern
start	integer	Start position (default 1)
N	integer	Which occurrence to return (default 1)
flags	text	Regex flags
subexpr	integer	Capture group to return (0 = whole match)

Examples

First match

sql

SELECT regexp_substr('one 1 two 2 three 3', '\d+');

→1

Second match

sql

SELECT regexp_substr('one 1 two 2 three 3', '\d+', 1, 2);

→2

Second occurrence of digits

sql

SELECT regexp_substr('2024-03-15', '(\d+)', 1, 2);

→03

Second capture group = domain

sql

SELECT regexp_substr('user@example.com', '([^@]+)@(\S+)', 1, 1, '', 2);

→example.com

sql

SELECT regexp_substr(col, '[A-Z]{2}\d{6}', 1, 1, 'i') FROM docs;

→First document reference code

⚠Anti-Pattern— Using regexp_substr on PostgreSQL versions before 15 without a compatibility fallback

`regexp_substr` was added in PostgreSQL 15. Using it on PG 14 or earlier raises `ERROR: function regexp_substr(text, text) does not exist` at runtime — not at deploy time, so it may only surface in production when the code path is hit.

✓ Instead: For pre-PG15 compatibility, use `(regexp_match(col, pattern))[1]` for first-match extraction. For Nth-match extraction, use `regexp_matches` with a row_number() CTE. Document the PG15+ requirement clearly if you choose to use regexp_substr.

`regexp_substr(col, pattern)` is equivalent to `(regexp_match(col, pattern))[1]` for simple first-match extraction. The power of `regexp_substr` is the N parameter (Nth match) — previously requiring `regexp_matches` + row_number() + a subquery. For extracting the 3rd IP address in a log line: `regexp_substr(col, '\d+\.\d+\.\d+\.\d+', 1, 3)` replaces a multi-step CTE.

example

SELECT regexp_substr(url, 'https?://[^/]+') AS domain FROM web_logs;

→'https://example.com' extracted from full URL

Common Gotchas

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

repeat

PG 8.0+→ text

Repeats the string the specified number of times.

Developer

Signature

repeat ( string text, number integer ) → text

Parameters

Parameter	Type	Description
string	text	String to repeat
number	integer	Number of times to repeat; 0 or negative returns empty string

Examples

sql

SELECT repeat('Pg', 4);

→PgPgPgPg

sql

SELECT repeat('*', 10);

→**********

sql

SELECT repeat(' ', indent_level * 2) || label AS tree FROM nodes;

→ child_node (indented tree output)

Zero repetitions returns empty string

sql

SELECT repeat('a', 0);

→

sql

SELECT repeat(md5(random()::text), 5) AS long_token;

→160-char pseudo-random string for test data

⚠Anti-Pattern— Using repeat() with a large count on user-supplied input without a size limit

`repeat('x', n)` where `n` comes from user input with no validation can allocate gigabytes of memory and crash the server. `repeat('x', 1000000000)` will attempt to allocate ~1GB.

✓ Instead: Always cap the repeat count: `repeat('x', LEAST(n, 10000))` or add a CHECK constraint or application-layer validation before passing user input to repeat().

`INSERT INTO test_table(data) SELECT repeat(md5(random()::text), 100) FROM generate_series(1, 10000)` populates 10,000 rows with ~3.2KB of random-looking data each. Use `repeat('x', n)` for a precisely sized string, or `repeat(md5(i::text), n)` for deterministic data keyed to row number. Eliminates the need for a Python/shell data generator for load testing.

example

SELECT repeat('*', round(score/10)::int) AS bar FROM ratings;

→ASCII bar chart: score 80 → '********'

replace

PG 8.0+→ text

Replaces all non-overlapping occurrences of from with to in the string.

DeveloperData Eng

Signature

replace ( string text, from text, to text ) → text

Parameters

Parameter	Type	Description
string	text	Input string
from	text	Substring to find and replace (case-sensitive, literal — not a regex)
to	text	Replacement string (can be empty to delete)

Examples

sql

SELECT replace('Hello World', 'World', 'PostgreSQL');

→Hello PostgreSQL

Replaces ALL occurrences

sql

SELECT replace('aabbaabb', 'aa', 'X');

→XbbXbb

sql

SELECT replace(url, 'http://', 'https://') FROM links;

→URLs upgraded to HTTPS

sql

SELECT replace(col, chr(9), '    ') FROM data;

→Tabs replaced with 4 spaces

sql

SELECT replace(phone, '-', '') FROM contacts;

→Phone with dashes removed

⚠Anti-Pattern— Chaining multiple replace() calls when translate() or regexp_replace() would work in one pass

`replace(replace(replace(col, 'é', 'e'), 'ñ', 'n'), 'ü', 'u')` applies three full passes over potentially large strings. As the number of substitutions grows, performance degrades linearly.

✓ Instead: Use `translate(col, 'éñü', 'enu')` for single-character substitutions — one pass regardless of how many character mappings. For substring replacements (length > 1), use `regexp_replace` with alternation: `regexp_replace(col, 'foo|bar', '', 'g')`.

`replace(replace(replace(col, 'a', 'x'), 'b', 'y'), 'c', 'z')` is three sequential passes. `translate(col, 'abc', 'xyz')` does the same in one pass and is significantly faster on large strings. Use `replace` when replacing substrings (length > 1). Use `translate` when doing character-for-character substitution. Use `regexp_replace` when you need pattern matching.

example

UPDATE products SET sku = replace(sku, '-', '_') WHERE sku LIKE '%-%';

→'PROD-001-A' → 'PROD_001_A'

reverse

PG 8.0+→ text

Returns the characters of the string in reverse order.

Developer

Signature

reverse ( string text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to reverse

Examples

sql

SELECT reverse('hello');

→olleh

Not a palindrome

sql

SELECT reverse('abcde') = 'abcde';

→false

sql

SELECT col = reverse(col) AS is_palindrome FROM words;

→true/false per row

Unicode-aware — reverses characters not bytes

sql

SELECT reverse('日本語');

→語本日

sql

SELECT * FROM domains ORDER BY reverse(domain);

→Rows sorted by reversed domain (groups by TLD then SLD)

⚠Anti-Pattern— Using reverse() for suffix-pattern matching without a supporting index

`WHERE reverse(col) LIKE 'moc.elpmaxe.%'` can use an index on `reverse(col)`, but developers often write this in ad-hoc queries without creating the functional index, resulting in a full sequential scan.

✓ Instead: Create `CREATE INDEX ON domains (reverse(domain))` before using `WHERE reverse(domain) LIKE reverse('%.' || $1)` in production queries. Without the index, reverse-based suffix matching is no faster than `col LIKE '%suffix'`.

`ORDER BY reverse(domain)` sorts 'mail.example.com', 'api.example.com', and 'www.example.com' together because reversed they share the same prefix 'moc.elpmaxe.'. This enables B-tree index scans on suffix patterns — combine with `CREATE INDEX ON domains (reverse(domain))` and query `WHERE reverse(domain) LIKE reverse('%.example.com')` for fast suffix lookups without full-text search.

example

SELECT col FROM t WHERE reverse(col) LIKE reverse('%suffix');

→Efficient suffix search using index on reverse(col)

right

PG 8.0+→ text

Returns the last n characters of the string; if n is negative, returns all but the first |n| characters.

DeveloperData Eng

Signature

right ( string text, n integer ) → text

Parameters

Parameter	Type	Description
string	text	Input string
n	integer	Number of characters from the right; negative means skip from the left

Examples

Last 6 characters

sql

SELECT right('Hello, World!', 6);

→orld!)

Negative: skip first 2 characters

sql

SELECT right('abcdef', -2);

→cdef

sql

SELECT right(sku, 4) AS variant_code FROM products;

→Last 4 chars of SKU

n larger than string returns the whole string

sql

SELECT right('hello', 100);

→hello

sql

SELECT right(filename, length(filename) - length(directory) - 1) AS basename;

→Filename without directory prefix

⚠Anti-Pattern— Using right() in a WHERE clause to test for a suffix instead of LIKE

`WHERE right(col, 4) = '.csv'` wraps the column in a function call, which prevents B-tree index use. Every row is evaluated at runtime.

✓ Instead: Use `WHERE col LIKE '%.csv'` — while LIKE with a leading wildcard also can't use a B-tree index, it avoids an unnecessary function call. For indexed suffix searches, use a functional index on `reverse(col)` and query `WHERE reverse(col) LIKE reverse('%.csv')`.

To skip the first n characters, `right(col, -n)` is cleaner and more readable than `substr(col, n+1)`. For example, to strip a 4-character 'IMG_' prefix: `right(filename, -4)`. Equivalently, `substring(filename FROM 5)` — choose based on which reads more naturally in context. The negative n form of `right()` is less well-known but saves a position calculation.

example

SELECT right(phone, 4) AS last_four FROM customers;

→'555-123-4567' → '4567'

rpad

PG 8.0+→ text

Right-pads the string to the specified length by appending copies of the fill string (default: space).

DeveloperData Eng

Signatures

rpad ( string text, length integer ) → text

rpad ( string text, length integer, fill text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to pad
length	integer	Target total length; truncates if string is longer
fill	text	Padding string (default: single space)

Examples

sql

SELECT rpad('hello', 10);

→hello

Custom fill character

sql

SELECT rpad('hello', 10, '.');

→hello.....

sql

SELECT rpad(label, 20, ' ') || value FROM settings;

→label value (tabular output)

Truncated when string exceeds length

sql

SELECT rpad('toolongstring', 5);

→toolon

sql

SELECT rpad(col, max_len, ' ') FROM fixed_width_export;

→Space-padded fixed-width fields for legacy exports

⚠Anti-Pattern— Forgetting that rpad truncates strings longer than the target length

`rpad('toolongstring', 5)` returns `'toolon'` — data is silently truncated, not an error. When generating fixed-width export files, any value longer than the field width will be cut off without warning, corrupting the record layout.

✓ Instead: Add an assertion or validation step before rpad in ETL pipelines: `WHERE length(col) <= 20` or raise an exception in PL/pgSQL when any value exceeds the column width.

Legacy system integrations often require fixed-width text files: `SELECT rpad(last_name, 20) || rpad(first_name, 15) || lpad(amount::text, 10, '0') FROM records`. This creates properly formatted fixed-width output entirely in SQL — no application-layer string building needed. Faster than constructing records in application code when exporting millions of rows.

example

SELECT rpad(name, 30, '.') || price::text FROM menu_items;

→'Burger....................4.99'

rtrim

PG 8.0+→ text

Removes the longest string of characters in the characters set from the end (right side) of the string.

DeveloperDBAData Eng

Signatures

rtrim ( string text ) → text

rtrim ( string text, characters text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to right-trim
characters	text	Character set to remove (default: space)

Examples

sql

SELECT rtrim('hello   ');

→hello

sql

SELECT rtrim('hello...', '.');

→hello

Strip trailing zeros

sql

SELECT rtrim('123000', '0');

→123

sql

SELECT rtrim('path/to/dir/', '/');

→path/to/dir

sql

SELECT rtrim(rpad(col, 100), ' ') FROM char_columns;

→Trim CHAR column padding

⚠Anti-Pattern— Using rtrim to strip trailing zeros from numeric strings before arithmetic

`rtrim('3.14000', '0')` returns `'3.14'` as expected, but `rtrim('100', '0')` returns `'1'` — stripping significant zeros too. The characters argument is a set, not a suffix, so it removes any trailing character that is in the set.

✓ Instead: For stripping trailing decimal zeros from numeric strings, cast to numeric and back: `trim(trailing '0' from col::numeric::text)` is safer, or use `to_char(val::numeric, 'FM999999990.##########')` for formatting.

A `CHAR(20)` column storing 'hello' actually stores 'hello ' (padded with 15 spaces). Comparing it with `= 'hello'` works in SQL but if you ever extract it as text and compare in application code, the trailing spaces will cause mismatches. Add `rtrim(col)` in your SELECT, or better yet: avoid CHAR and use VARCHAR or TEXT instead.

example

SELECT rtrim(description) FROM imported_data WHERE description != rtrim(description);

→Descriptions with trailing whitespace stripped

split_part

PG 8.0+ (negative n from PG 14+)→ text

Splits the string on the delimiter and returns the nth field (1-based); returns empty string for out-of-range n.

DeveloperData Eng

Signature

split_part ( string text, delimiter text, n integer ) → text

Parameters

Parameter	Type	Description
string	text	Input string to split
delimiter	text	Literal delimiter string (not a regex)
n	integer	1-based field index; negative values count from the right (PG 14+)

Examples

sql

SELECT split_part('one,two,three', ',', 2);

→two

Extract domain from email

sql

SELECT split_part('user@example.com', '@', 2);

→example.com

Out-of-bounds returns empty string, not NULL

sql

SELECT split_part('a,b', ',', 5);

→

sql

SELECT split_part('14.2.1', '.', 1)::int AS major, split_part('14.2.1', '.', 2)::int AS minor;

→14 | 2

Negative index from PG 14

sql

SELECT split_part(path, '/', -1) FROM file_paths;

→Last path segment (filename) — PG 14+

⚠Anti-Pattern— Using split_part to parse CSV fields that may contain quoted delimiters

`split_part('"Smith, John",30', ',', 1)` returns `'"Smith'` — not `'"Smith, John"'`. split_part does not understand CSV quoting rules and splits on every literal delimiter character.

✓ Instead: For proper CSV parsing with quoted fields, use PostgreSQL's COPY command, the `csv_fdw` extension, or parse in application code. split_part is safe only for delimiter-separated values where the delimiter cannot appear in field values.

Unlike `(string_to_array(col, ','))[5]` which returns NULL for out-of-bounds, `split_part(col, ',', 5)` returns '' (empty string). This matters in CHECK constraints and WHERE clauses where you don't want NULL propagation. In PG 14+, `split_part(col, '/', -1)` extracts the last segment of a path — previously you needed `regexp_replace(col, '.*/(.+)

#x27;, '\1')` for this.

example

SELECT split_part(email, '@', 2) AS domain FROM users GROUP BY 1 ORDER BY count(*) DESC LIMIT 5;

→Top 5 email domains

starts_with

PG 9.6+→ boolean

Returns true if the string begins with the given prefix.

Developer

Signature

starts_with ( string text, prefix text ) → boolean

Parameters

Parameter	Type	Description
string	text	Input string to check
prefix	text	Prefix to test for

Examples

sql

SELECT starts_with('hello world', 'hello');

→true

sql

SELECT starts_with('hello world', 'world');

→false

Case-sensitive

sql

SELECT starts_with('Hello', 'hello');

→false

sql

SELECT * FROM files WHERE starts_with(path, '/home/user/');

→Files under /home/user/

Empty prefix always matches

sql

SELECT starts_with(col, '');

→true

⚠Anti-Pattern— Using starts_with() for case-insensitive prefix checks without lowercasing both sides

`starts_with(col, 'IMG')` does not match rows where col starts with 'img' or 'Img'. This is easy to miss when data comes from mixed-case sources.

✓ Instead: For case-insensitive prefix checks: `starts_with(lower(col), lower('IMG'))` or `col ILIKE 'IMG%'`. Create a functional index on `lower(col)` if performance matters.

`WHERE col LIKE 'prefix%'` can use a standard B-tree index. `WHERE starts_with(col, prefix)` can too, but only if the query planner recognizes the pattern — which it reliably does. Both are equivalent in practice for literal prefixes. However, for parameterized queries: `WHERE col LIKE $1 || '%'` vs `WHERE starts_with(col, $1)` — the `starts_with` form is cleaner and the planner handles both. Avoid `WHERE left(col, n) = prefix` — it can prevent index use.

example

SELECT * FROM products WHERE starts_with(sku, 'ELEC-');

→All electronics SKUs — faster than LIKE 'ELEC-%' in some cases

string_to_array

PG 9.0+→ text[]

Splits the string at each delimiter occurrence and returns the parts as a text array; an optional null_string specifies which field values should be replaced with NULL.

DeveloperData Eng

Signature

string_to_array ( string text, delimiter text [, null_string text ] ) → text[]

Parameters

Parameter	Type	Description
string	text	Input string to split
delimiter	text	Literal delimiter; if NULL, string is split into individual characters
null_string	text	Optional: field values equal to this string are returned as NULL

Examples

sql

SELECT string_to_array('a,b,c,d', ',');

→{a,b,c,d}

Empty fields become NULL

sql

SELECT string_to_array('a,,c', ',', '');

→{a,NULL,c}

NULL delimiter splits into individual characters

sql

SELECT string_to_array('hello', NULL);

→{h,e,l,l,o}

sql

SELECT unnest(string_to_array(tags, ',')) AS tag FROM posts;

→Individual tag rows from comma-separated string

Cast to integer array after splitting

sql

SELECT string_to_array('1:2:3', ':')::integer[];

→{1,2,3}

⚠Anti-Pattern— Using string_to_array on a column that should be stored as a native array

If you routinely split a text column with string_to_array to use its elements — filtering, joining, or unnesting — you've found a schema design problem. Splitting on every query adds overhead and prevents efficient GIN indexing.

✓ Instead: Migrate the column to a native `text[]` array type. Use GIN indexes for containment and overlap queries (`&&`, `@>`). The data becomes a first-class array, queryable without parsing on every access.

`SELECT p.id, t.tag FROM posts p, unnest(string_to_array(p.tags, ',')) AS t(tag)` unnests the CSV tags column into individual rows that can be JOINed against a tags table. This is faster and simpler than a correlated subquery for normalizing denormalized CSV columns. For proper multi-value columns, migrate to actual arrays (`text[]`) and use the `&&` and `@>` operators with GIN indexes.

example

SELECT unnest(string_to_array(tags, ',')) AS tag FROM posts;

→One tag per row from comma-separated tag strings

Common Gotchas

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

string_to_table

PG 14+→ setof text

Splits the string at each delimiter and returns each part as a separate row in a set.

DeveloperData Eng

Signature

string_to_table ( string text, delimiter text [, null_string text ] ) → setof text

Parameters

Parameter	Type	Description
string	text	Input string to split
delimiter	text	Literal delimiter; NULL splits into individual characters
null_string	text	Optional: field values matching this string are returned as NULL

Examples

sql

SELECT * FROM string_to_table('one,two,three', ',');

→

one
two
three (one row each)

sql

SELECT * FROM string_to_table('a,,c', ',', '');

→

a
NULL
c

sql

SELECT * FROM string_to_table('hello', NULL);

→

h
e
l
l
o

sql

SELECT trim(t) FROM string_to_table('  a , b , c ', ',') AS t WHERE trim(t) <> '';

→

a
b
c

sql

SELECT word, length(word) FROM string_to_table('the quick brown fox', ' ') AS word ORDER BY length DESC;

→Words ordered by length

⚠Anti-Pattern— Using string_to_table on PG versions before 14 without a fallback

`string_to_table` was added in PostgreSQL 14. Using it without a version check will raise `ERROR: function string_to_table(text, text) does not exist` on older servers, which is a runtime error that only surfaces in production.

✓ Instead: For pre-PG14 compatibility, use `unnest(string_to_array(col, ','))` which works from PG 9.0+. Add a comment noting when you can drop the workaround once your minimum supported version is PG 14.

`string_to_table(col, ',')` is cleaner and slightly more efficient than `unnest(string_to_array(col, ','))` — it skips the intermediate array allocation. The set-returning form also integrates better with LATERAL joins: `FROM posts p, string_to_table(p.tags, ',') AS tag` is perfectly valid. For pre-PG14 compatibility, keep using the unnest pattern.

example

SELECT trim(val) AS item FROM string_to_table('apple, banana, cherry', ',') val;

→'apple', ' banana', ' cherry' — trim each returned value

Common Gotchas

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

substr

PG 8.0+→ text

Extracts a substring starting at position start (1-based) for count characters; equivalent to the substring() function.

Developer

Signatures

substr ( string text, start integer ) → text

substr ( string text, start integer, count integer ) → text

Parameters

Parameter	Type	Description
string	text	Input string
start	integer	Starting position (1-based); values <= 0 are treated as 1
count	integer	Number of characters to extract; if omitted, extracts to end of string

Examples

From position 8 to end

sql

SELECT substr('Hello, World!', 8);

→World!

sql

SELECT substr('Hello, World!', 8, 5);

→World

Start <= 0 treated as 1 (but count adjusted)

sql

SELECT substr('abcdef', 0, 4);

→abc

sql

SELECT substr(md5(id::text), 1, 8) AS short_hash FROM items;

→8-char hash prefix per row

sql

SELECT substr(isbn, 4, 9) AS publisher_code FROM books;

→Middle segment of ISBN

⚠Anti-Pattern— Using substr(col, 0, n) expecting n characters but getting n-1

`substr('hello', 0, 4)` returns `'hel'` — 3 characters, not 4. Position 0 is treated as 1, but the count is also adjusted, so you get one fewer character than you might expect. This off-by-one trips up developers who port code from languages where string slicing is 0-indexed.

✓ Instead: Always use 1-based positions: `substr(col, 1, 3)` clearly returns the first 3 characters. Or use `left(col, 3)` which is unambiguous.

`substr('hello', 0, 4)` returns 'hel' (3 chars). PostgreSQL treats position 0 as before position 1 and adjusts count accordingly. This is SQL-standard behavior but confusing. Stick to 1-based indexing: `substr('hello', 1, 3)` unambiguously returns 'hel'. Use `left(col, n)` for extracting the first n characters — it's the clearest of all three options.

example

SELECT substr(isbn, 4, 9) AS issn_part FROM books;

→Extracts 9 characters starting at position 4

substring

PG 8.0+→ text

Extracts a substring by position, POSIX regex capture, or SQL regex pattern.

Developer

Signatures

substring ( string text [ FROM start integer ] [ FOR count integer ] ) → text

substring ( string text FROM pattern text ) → text

substring ( string text SIMILAR pattern text ESCAPE escape text ) → text

Parameters

Parameter	Type	Description
string	text	Input string
start	integer	Starting position (1-based)
count	integer	Length of substring
pattern	text	POSIX regex; the part in the first capture group () is returned

Examples

sql

SELECT substring('Thomas' FROM 2 FOR 3);

→hom

sql

SELECT substring('Thomas' FROM 3);

→omas

POSIX regex — returns matched portion

sql

SELECT substring('foobar' FROM 'o+b');

→oob

Capture group extracts year

sql

SELECT substring('2024-03-15' FROM '^(\d{4})-');

→2024

sql

SELECT substring(email FROM '@(.*)
    
    
    
  

) FROM users;

→example.com — domain portion of email

⚠Anti-Pattern— Using substring(col FROM pattern) expecting all capture groups to be returned

`substring('2024-03-15' FROM '(\d{4})-(\d{2})-(\d{2})')` returns only `'2024'` — the first capture group. Developers expecting all three groups are returned get only the year and silently lose month and day.

✓ Instead: Use `regexp_match('2024-03-15', '(\d{4})-(\d{2})-(\d{2})')` which returns `{2024,03,15}` — all capture groups as an array.

`substring('foobar' FROM 'o+b')` returns 'oob' — the full match. To return only part of the match, wrap the desired part in parentheses: `substring('foobar' FROM 'o+(b)')` returns 'b'. This is the key difference from `regexp_match()`. If your pattern has multiple capture groups, only the first one is returned — use `regexp_match()` for multi-group extraction.

example

SELECT substring(phone FROM '^\+([0-9]+)') AS country_code FROM contacts;

→'44' from '+441234567890'

Common Gotchas

⚠

regexp_replace without 'g' flag only replaces the FIRST match

regexp_replace replaces only the first occurrence by default. The 'g' flag is required for global (all occurrences) replacement.

to_ascii

PG 8.0+→ text

Converts string to ASCII by removing or replacing accented and diacritical characters from Latin-based encodings. Requires specifying encoding in UTF-8 databases.

Developer

Signatures

to_ascii ( string text ) → text

to_ascii ( string text, encoding name ) → text

to_ascii ( string text, encoding integer ) → text

Parameters

Parameter	Type	Description
string	text	Input string to convert
encoding	name or integer	Source encoding (e.g. LATIN1, WIN1252); required for UTF-8 databases

Examples

Latin1 accent stripped (e with accent)

sql

SELECT to_ascii('cafe' || chr(233));

→cafe

Pure ASCII passes through unchanged

sql

SELECT to_ascii('naive');

→naive

Already ASCII

sql

SELECT to_ascii('Angstrom');

→Angstrom

Explicit encoding for UTF-8 database

sql

SELECT to_ascii(convert_to('café', 'LATIN1')::text, 'LATIN1');

→cafe

Stripped string has same length as plain ASCII

sql

SELECT length(to_ascii('résumé')) = length('resume');

→true

⚠Anti-Pattern— Calling to_ascii() without encoding in UTF-8 database

In UTF-8 databases, to_ascii() without an encoding argument throws an error or corrupts data because it expects single-byte input.

✓ Instead: Use CREATE EXTENSION unaccent; SELECT unaccent(col); — works reliably in UTF-8 across all Latin-derived scripts.

to_ascii() errors or returns garbage in UTF-8 without specifying encoding. For comprehensive diacritic removal in UTF-8, install unaccent: CREATE EXTENSION unaccent; SELECT unaccent(col);

example

CREATE EXTENSION IF NOT EXISTS unaccent; SELECT unaccent('Héllo Wörld');

→Hello World

to_hex

PG 8.0+→ text

Converts an integer to its hexadecimal string representation (lowercase, no '0x' prefix).

Developer

Signatures

to_hex ( number integer ) → text

to_hex ( number bigint ) → text

Parameters

Parameter	Type	Description
number	integer \| bigint	Integer value to convert to hex

Examples

sql

SELECT to_hex(255);

→ff

sql

SELECT to_hex(256);

→100

Add 0x prefix manually if needed

sql

SELECT '0x' || to_hex(255);

→0xff

sql

SELECT '#' || lpad(to_hex(r*65536 + g*256 + b), 6, '0') FROM colors;

→#ff8040 — CSS hex color from RGB integers

sql

SELECT lpad(to_hex(id), 8, '0') AS hex_id FROM items;

→00000042 — zero-padded hex ID

⚠Anti-Pattern— Using to_hex without lpad and then comparing hex strings of different lengths

`to_hex(255)` = `'ff'` (2 chars) but `to_hex(256)` = `'100'` (3 chars). Storing or comparing raw to_hex output as identifiers or sort keys produces incorrect lexicographic ordering and mismatched string lengths.

✓ Instead: Always pad to a consistent width: `lpad(to_hex(val), 8, '0')` for 32-bit values, or `lpad(to_hex(val), 16, '0')` for 64-bit. This ensures correct string comparison and fixed-width output.

`'#' || lpad(to_hex(red * 65536 + green * 256 + blue), 6, '0')` converts three 0-255 integer channels to a CSS hex color string. The `lpad` with '0' ensures colors like `#00ff00` don't become `#ff00` when channels are zero. Useful when color data is stored as separate R/G/B columns and you need to export or display as hex.

example

SELECT '#' || to_hex(r) || to_hex(g) || to_hex(b) AS hex_color FROM color_palette;

→'#ff5733' style hex color codes

Common Gotchas

⚠

Integer division truncates silently — no error, no rounding

Dividing two integers in PostgreSQL always returns an integer, truncated toward zero. 5/2 = 2, not 2.5.

translate

PG 8.0+→ text

Replaces each character in string that matches a character in from with the corresponding character in to; characters with no corresponding to character are deleted.

DeveloperDBAData Eng

Signature

translate ( string text, from text, to text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to transform
from	text	Characters to match (each character individually)
to	text	Replacement characters (shorter than from means extras are deleted)

Examples

'4' has no replacement → deleted; '1'→'a', '3'→'x'

sql

SELECT translate('12345', '143', 'ax');

→ax25

Replace all vowels

sql

SELECT translate('Hello World', 'aeiouAEIOU', '**********');

→H*ll* W*rld

sql

SELECT translate(col, '0123456789', '');

→String with all digits removed

sql

SELECT translate(lower(input), 'àáâäåæçèéêëìíîïñòóôöùúûüý', 'aaaaaaaceeeeiiiinooooouuuuy') FROM data;

→Accent-normalized string

sql

SELECT translate(text,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz','NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm');

→ROT13 encoded text

⚠Anti-Pattern— Confusing translate() with replace() and expecting substring matching

`translate('hello', 'el', 'ip')` returns `'hippo'` — it maps 'e'→'i' and 'l'→'p' character by character. Developers expecting `'el'` to be treated as a literal substring (like replace) are surprised when individual 'e' and 'l' characters are substituted throughout the string.

✓ Instead: Use `replace(col, 'from_substring', 'to_substring')` when you need to substitute a multi-character literal. Use `translate` only for character-to-character one-for-one mapping.

`translate(lower(col), 'àáâäçèéêëìíîïñòóôöùúûü', 'aaaceeeeiiiinoooouuuu')` normalizes accented Latin characters for ASCII-safe searching — faster than regex for character-level substitutions because it's a single O(n) pass. Also implements ROT13 for light obfuscation. When `to` is shorter than `from`, extra `from` characters are simply deleted — use `translate(col, '0123456789', '')` to strip all digits.

example

SELECT translate(input, 'àáâãäåèéêëìíîïòóôõöùúûü', 'aaaaaaeeeeiiiioooooouuuu') FROM raw_text;

→Simple accent stripping for search normalisation

Common Gotchas

⚠

Foreign keys don't auto-create an index on the referencing column

PostgreSQL creates an index on the primary key side of a FK, but NOT on the foreign key column. Missing FK indexes cause slow lookups and lock escalation on DELETE.

trim

PG 8.0+→ text

Removes the specified characters from the start, end, or both ends of a string; defaults to removing spaces from both ends.

DeveloperDBAData Eng

Signatures

trim ( [ LEADING | TRAILING | BOTH ] [ characters text ] FROM string text ) → text

trim ( [ LEADING | TRAILING | BOTH ] [ FROM ] string text [, characters text ] ) → text

Parameters

Parameter	Type	Description
LEADING\|TRAILING\|BOTH	keyword	Which end(s) to trim; BOTH is the default
characters	text	Set of characters to remove (default: space)
string	text	Input string to trim

Examples

Default: trim spaces from both ends

sql

SELECT trim('  hello  ');

→hello

sql

SELECT trim(BOTH 'x' FROM 'xxxhelloxxx');

→hello

sql

SELECT trim(LEADING '0' FROM '000123');

→123

sql

SELECT trim(TRAILING '.' FROM 'end...');

→end

sql

SELECT trim(BOTH FROM '   spaces   ');

→spaces

⚠Anti-Pattern— Trimming user input only at read time in queries instead of at write time

Wrapping every query with `WHERE trim(col) = $1` or `SELECT trim(col)` is wasteful — the trim executes on every read across all rows. If the column always stores untrimmed data, every consumer must remember to trim.

✓ Instead: Enforce trimming at write time: add a CHECK constraint `CHECK (col = trim(col))` or a BEFORE INSERT/UPDATE trigger that normalizes the value. The column then always stores clean data and queries need no wrapper.

`trim(BOTH FROM col)` = `btrim(col)`. `trim(LEADING FROM col)` = `ltrim(col)`. `trim(TRAILING FROM col)` = `rtrim(col)`. Use whichever reads most naturally in context. The SQL standard form `trim(BOTH '.' FROM col)` is portable across databases; the function forms `btrim/ltrim/rtrim` are PostgreSQL-specific but more concise in application code. All behave identically — the characters argument is a SET, not a substring.

example

SELECT trim(both '"' from json_string_value) AS clean FROM imported_json;

→'"hello"' → 'hello' — strip JSON double-quotes

unistr

PG 9.0+→ text

Evaluates escape sequences in the string to produce a Unicode text value; \XXXX or \+XXXXXX represent Unicode code points in hex.

DeveloperDBA

Signature

unistr ( string text ) → text

Parameters

Parameter	Type	Description
string	text	String with Unicode escape sequences in \XXXX or \+XXXXXX format

Examples

\0061 = 'a', \0074 = 't'

sql

SELECT unistr('d\0061t\0061');

→data

Euro sign from hex code point

sql

SELECT unistr('\20AC');

→€

\0020 = space

sql

SELECT unistr('Hello\0020World');

→Hello World

Emoji via 6-digit supplementary code point

sql

SELECT unistr('\+01F600');

→😀

sql

SELECT unistr('caf\00E9');

→café

⚠Anti-Pattern— Confusing unistr 4-digit escape format with standard SQL or Python Unicode escapes

`unistr('\u20AC')` does NOT work — the format requires `\20AC` (no `u`). Developers copy-pasting escape sequences from JavaScript (`\u20AC`) or Python (`\u20ac`) into unistr get an error or wrong output.

✓ Instead: Use `unistr('\20AC')` for BMP characters (U+0000 to U+FFFF) and `unistr('\+01F600')` for supplementary characters (above U+FFFF). No `u` prefix — just the bare hex digits after the backslash.

When writing SQL migration scripts that need to insert emoji, accented characters, or special symbols, `unistr('\+01F680')` (🚀) is more robust than pasting the literal character — some editors/terminals mangle multibyte characters during copy-paste. Use the hex code point form for any character above U+00FF in scripts that cross encoding boundaries.

example

SELECT unistr('caf\00e9') AS word;

→'café' — Unicode escape sequence decoded

upper

PG 8.0+→ text

Converts all characters in the string to their upper-case equivalents using the current locale.

DeveloperDBA

Signature

upper ( string text ) → text

Parameters

Parameter	Type	Description
string	text	Input string to convert to upper case

Examples

sql

SELECT upper('hello');

→HELLO

sql

SELECT upper('PostgreSQL');

→POSTGRESQL

sql

SELECT upper(NULL);

→NULL

sql

SELECT upper(status) = 'ACTIVE' FROM users;

→Case-insensitive status check

sql

SELECT upper(country_code) FROM addresses WHERE length(country_code) = 2;

→Normalized ISO country codes

⚠Anti-Pattern— Calling upper() in every WHERE clause instead of normalizing at insert time

`WHERE upper(status) = 'ACTIVE'` prevents index use on `status` and runs upper() on every row in the table on every query. If `status` is supposed to be uppercase, callers should not be responsible for normalizing it at read time.

✓ Instead: Store status values in a consistent case enforced at write time (`CHECK (status = upper(status))`). Then `WHERE status = 'ACTIVE'` uses a plain B-tree index with no function call overhead.

Calling `upper()` at read time on every query is wasteful. Instead, enforce case at write time: `CHECK (status = upper(status))` or use a trigger `NEW.country_code := upper(NEW.country_code)`. This ensures the column always contains uppercase values, making WHERE queries faster (no function call needed) and allowing standard B-tree indexes to be used without functional indexes.

example

SELECT upper(country_code) FROM addresses WHERE country_code != upper(country_code);

→Fix inconsistently cased country codes

Common Gotchas

⚠

COALESCE evaluates ALL arguments — not short-circuit like IF

COALESCE evaluates every argument before returning the first non-NULL. This means expensive functions or side-effectful calls on later arguments ARE executed.

PostgreSQL String Functions

Jump to function

What are PostgreSQL String Functions?

ascii

Signature

Parameters

Examples

bit_length

Signatures

Parameters

Examples

btrim

Signatures

Parameters

Examples

char_length

Signatures

Parameters

Examples

Common Gotchas

chr

Signature

Parameters

Examples

concat

Signature

Parameters

Examples

Common Gotchas

concat_ws

Signature

Parameters

Examples

Common Gotchas

convert_from

Signature

Parameters

Examples

decode

Signature

Parameters

Examples

encode

Signature

Parameters

Examples

format

Signature

Parameters

Examples

initcap

Signature

Parameters

Examples

left

Signature

Parameters

Examples

length

Signatures

Parameters

Examples

Common Gotchas

lower

Signature

Parameters

Examples

Common Gotchas

lpad

Signatures

Parameters

Examples

ltrim

Signatures

Parameters

Examples

md5

Signatures

Parameters

Examples