Know Before You Burn

PostgreSQL Gotchas & Traps

The most common PostgreSQL function mistakes — NULL surprises, timezone traps, silent data loss, and performance landmines. With correct and incorrect examples side by side.

In SQL, NULL represents an unknown value. Any comparison with NULL returns NULL (treated as false in WHERE). This means '= NULL' never matches any row — not even rows where the value IS actually NULL. The correct operator is IS NULL.

Wrong
incorrect
SELECT * FROM users WHERE deleted_at = NULL;

Result: 0 rows (always — even when deleted_at IS NULL)

Correct
correct
SELECT * FROM users WHERE deleted_at IS NULL;

Result: Rows where deleted_at is NULL

Use IS NULL / IS NOT NULL for null checks. For equality comparisons that should treat NULL = NULL as true, use IS NOT DISTINCT FROM instead of =, and IS DISTINCT FROM instead of !=.

When both operands of / are integer types, PostgreSQL performs integer division. The result is truncated toward zero with no warning. This silently loses precision in calculations like percentages, averages, and rates.

Wrong
incorrect
SELECT 5 / 2 AS result;

Result: 2

Correct
correct
SELECT 5.0 / 2 AS result;
-- or
SELECT 5 / 2.0;
-- or
SELECT 5::numeric / 2;

Result: 2.5

Ensure at least one operand is numeric/float. Common fix: cast the numerator: `count::numeric / total`. In aggregate contexts: `avg(col)` already returns numeric, but `sum(int_col) / count(*)` uses integer division — use `sum(col)::numeric / count(*)`.

LIKE performs byte-by-byte comparison and is case-sensitive. ILIKE calls a locale-aware comparison. ILIKE cannot use a standard B-tree index — it requires a functional index on lower() or pg_trgm extension for indexing.

Wrong
incorrect
-- Case-insensitive check that can't use an index:
SELECT * FROM products WHERE name ILIKE 'apple%';

Result: Correct results but full table scan on large tables

Correct
correct
-- Create a functional index:
CREATE INDEX idx_products_name_lower ON products (lower(name));

-- Then query:
SELECT * FROM products WHERE lower(name) LIKE lower('apple%');

Result: Same results, uses the index

For prefix matching, lower(col) LIKE lower(prefix)||'%' uses the functional index. For substring/suffix matching, use pg_trgm with a GIN index: `CREATE INDEX ON products USING gin(name gin_trgm_ops)` — then ILIKE and LIKE '%substring%' both use it.

now(), current_timestamp, and current_date all return the timestamp when the current transaction started. In a multi-second or multi-minute transaction, they stay frozen. This means using now() for elapsed-time calculations inside a transaction will always show zero elapsed time.

Wrong
incorrect
-- Inside a long transaction — this doesn't measure actual elapsed time:
SELECT now() - start_time AS elapsed FROM long_running_task;

Result: Incorrect elapsed time if called within same transaction as start_time assignment

Correct
correct
-- Use clock_timestamp() for real wall-clock time:
SELECT clock_timestamp() - start_time AS elapsed FROM long_running_task;

-- Or timeofday() which returns a text timestamp:
SELECT timeofday();

Result: Actual wall-clock elapsed time

now() = current_timestamp = transaction start. clock_timestamp() = actual system clock, changes with each call. statement_timestamp() = start of the current SQL statement. For audit timestamps that should reflect the real commit time, use clock_timestamp() in triggers.

TIMESTAMP WITHOUT TIME ZONE stores exactly what you give it — no conversion, no timezone awareness. If your app inserts '2024-03-15 14:00:00' assuming UTC but the DB session is in America/New_York, TIMESTAMPTZ would store the UTC equivalent, but TIMESTAMP stores the literal value. When DST changes, TIMESTAMP values become ambiguous.

Wrong
incorrect
-- Using TIMESTAMP (no timezone) for user-facing times:
CREATE TABLE events (starts_at TIMESTAMP);

Result: Ambiguous timestamps across DST boundaries and timezone changes

Correct
correct
-- Use TIMESTAMPTZ for all user-facing datetime columns:
CREATE TABLE events (starts_at TIMESTAMPTZ);

-- Set session timezone explicitly:
SET timezone = 'UTC';

-- Or per connection in pg_hba/connection string

Result: Unambiguous UTC storage, correct local display per user timezone

Rule of thumb: use TIMESTAMPTZ everywhere. Convert to local time only for display using `timezone('America/New_York', col)` or AT TIME ZONE. Store in UTC, display locally. The only valid use of TIMESTAMP (no tz) is for purely local concepts like 'business hours' where timezone context is always implicit.

date_trunc always preserves the input type. date_trunc('month', timestamptz) returns a timestamptz at midnight on the first of the month. Comparing or grouping with a DATE column requires an explicit cast.

Wrong
incorrect
-- Trying to group by date alongside date_trunc result:
SELECT date_trunc('day', created_at) AS day, count(*)
FROM events
GROUP BY date_trunc('day', created_at), event_date;  -- event_date is DATE type

Result: May work or silently compare different types; unexpected results

Correct
correct
-- Cast to date if you need a DATE:
SELECT date_trunc('day', created_at)::date AS day, count(*)
FROM events
GROUP BY 1;

-- Or use created_at::date directly:
SELECT created_at::date AS day, count(*) FROM events GROUP BY 1;

Result: Correct grouping by calendar day as DATE type

date_trunc('day', ts)::date is the idiomatic way to convert a timestamp to a date. For date-only grouping in reporting queries, `created_at::date` is shorter and clearer than `date_trunc('day', created_at)::date`.

Aggregate functions process rows in an undefined order unless ORDER BY is specified. string_agg without ORDER BY may return different orderings on different runs, different PostgreSQL versions, or after VACUUM. This is also true for array_agg.

Wrong
incorrect
SELECT string_agg(name, ', ') FROM users GROUP BY dept;

Result: "Bob, Alice, Charlie" — order unpredictable and may change

Correct
correct
SELECT string_agg(name, ', ' ORDER BY name) FROM users GROUP BY dept;

Result: "Alice, Bob, Charlie" — stable alphabetical order

Always specify ORDER BY inside ordered aggregates if the output order matters. This applies to string_agg, array_agg, json_agg, jsonb_agg, and xmlagg. The ORDER BY clause goes inside the function call, not the outer query ORDER BY.

This is the most misunderstood COUNT behavior. COUNT(*) counts every row matching the WHERE clause. COUNT(col) counts only rows where col is NOT NULL. The difference is invisible until you have NULL values, and it can make a query look correct while silently undercounting.

Wrong
incorrect
-- Trying to count total records but using a nullable column:
SELECT COUNT(email) AS total_users FROM users;

Result: May return fewer than actual row count if any emails are NULL

Correct
correct
-- Count all rows:
SELECT COUNT(*) AS total_users FROM users;

-- Count rows where email is set (intentional NULL-skipping):
SELECT COUNT(email) AS users_with_email FROM users;

Result: Correct total row count

Use COUNT(*) for 'how many rows'. Use COUNT(col) intentionally when you specifically want to count non-null values. Use COUNT(DISTINCT col) to count unique non-null values. Never use COUNT(1) — it's identical to COUNT(*) and confuses readers.

BETWEEN is inclusive on both ends. For continuous ranges like dates and timestamps, this means the last moment of one range and the first moment of the next range can overlap. Especially dangerous when using DATE BETWEEN for inclusive daily ranges with timestamps.

Wrong
incorrect
-- Range for March that overlaps with April 1st midnight:
SELECT * FROM events
WHERE created_at BETWEEN '2024-03-01' AND '2024-03-31';

Result: Misses events on March 31 after midnight; '2024-03-31' = '2024-03-31 00:00:00'

Correct
correct
-- Use half-open interval for timestamp ranges:
SELECT * FROM events
WHERE created_at >= '2024-03-01'
  AND created_at < '2024-04-01';

Result: All events in March, including March 31 at any time

The half-open interval pattern [start, end) — >= start AND < end — is the correct approach for all timestamp range queries. It prevents double-counting at boundaries and works correctly regardless of the timestamp precision. Only use BETWEEN for true inclusive endpoints like integer IDs.

Unlike the short-circuit AND/OR operators, COALESCE does not guarantee short-circuit evaluation. COALESCE(fast_col, expensive_function()) will call expensive_function() even when fast_col is not NULL in some execution contexts (though PostgreSQL may optimize this — but it is not guaranteed by the standard).

Wrong
incorrect
-- This may still call the expensive fallback:
SELECT COALESCE(cached_value, compute_expensive_default(row_id)) FROM data;

Result: Potentially calls compute_expensive_default() even when cached_value is not NULL

Correct
correct
-- Use CASE for guaranteed short-circuit evaluation:
SELECT CASE WHEN cached_value IS NOT NULL
            THEN cached_value
            ELSE compute_expensive_default(row_id)
       END FROM data;

Result: compute_expensive_default() only called when cached_value IS NULL

In practice, PostgreSQL often short-circuits COALESCE, but the SQL standard does not require it. For correctness with volatile or expensive functions, use CASE WHEN col IS NOT NULL THEN col ELSE fallback END. This is also important with functions that have side effects.

Unlike virtually every programming language, SQL arrays in PostgreSQL are 1-indexed. Accessing index 0 returns NULL without an error. This is one of the most common bugs when processing arrays in PL/pgSQL functions written by developers who code in 0-indexed languages.

Wrong
incorrect
SELECT (ARRAY['a','b','c'])[0];

Result: NULL — no error, just silent wrong result

Correct
correct
SELECT (ARRAY['a','b','c'])[1];

Result: a

Remember: PostgreSQL array subscripts start at 1. array_length(arr, 1) gives the count. The last element is arr[array_length(arr, 1)]. You can technically create arrays with non-standard start indexes (arr[2:5]) but this is rarely used and confusing. Just remember: start at 1.

NOT IN is syntactic sugar for a series of != comparisons. Since NULL != anything evaluates to NULL (not true), the moment a NULL appears in the NOT IN list, no comparison can succeed. This makes NOT IN with nullable subqueries extremely dangerous.

Wrong
incorrect
-- manager_id column has some NULLs:
SELECT name FROM employees
WHERE id NOT IN (SELECT manager_id FROM employees);

Result: 0 rows — because some manager_id values are NULL

Correct
correct
-- Use NOT EXISTS instead:
SELECT name FROM employees e1
WHERE NOT EXISTS (
  SELECT 1 FROM employees e2
  WHERE e2.manager_id = e1.id
);

-- Or filter NULLs from the subquery:
WHERE id NOT IN (
  SELECT manager_id FROM employees WHERE manager_id IS NOT NULL
);

Result: Correct set of non-managers

NOT EXISTS is the safest and most performant alternative to NOT IN for subqueries. It correctly handles NULLs and often uses a more efficient anti-join execution plan. Only use NOT IN when the subquery column is declared NOT NULL.

Developers from Python, JavaScript, or Java expect substring search functions to return -1 or NULL when not found. PostgreSQL's strpos() (and position()) returns 0. The check > -1 always evaluates to true since 0 > -1.

Wrong
incorrect
-- This matches ALL rows, not just rows containing 'foo':
SELECT * FROM logs WHERE strpos(message, 'foo') > -1;

Result: All rows returned because strpos returns 0 (not -1), and 0 > -1 is true

Correct
correct
-- Correct: check for > 0
SELECT * FROM logs WHERE strpos(message, 'foo') > 0;

-- Or use LIKE/ILIKE:
SELECT * FROM logs WHERE message LIKE '%foo%';

Result: Only rows where message contains 'foo'

PostgreSQL strpos minimum return value is 0 (not found) or 1+ (position). Always check > 0 for 'found' and = 0 for 'not found'. This also applies to position('x' IN col) which uses the same semantics.

PostgreSQL performs implicit type casting on literals and parameters, but this can prevent index scans. The optimizer uses an index only when the WHERE expression exactly matches the index expression including type. A mismatch causes a type cast on the indexed column, invalidating the index.

Wrong
incorrect
-- col is type TEXT but we pass an integer:
CREATE INDEX idx_col ON t(col);
SELECT * FROM t WHERE col = 123;  -- 123 is integer

Result: Sequential scan — index not used because of implicit cast

Correct
correct
-- Pass the correct type:
SELECT * FROM t WHERE col = '123';

-- Or for parameterized queries, ensure the bind type matches:
-- $1::text or cast in application layer

Result: Index scan used

This is particularly common with ORMs that infer parameter types. Check query plans with EXPLAIN ANALYZE when a query seems slower than expected. Type-match your parameters to the column type. Use `EXPLAIN (ANALYZE, BUFFERS)` to confirm index usage.

UPDATE target SET col = source.val FROM source WHERE target.id = source.target_id — if source has duplicate target_ids, PostgreSQL will update target using one of the source rows but doesn't specify which one. No error is raised, and the result is non-deterministic.

Wrong
incorrect
-- source may have multiple rows per target.id:
UPDATE orders SET status = s.status
FROM order_updates s
WHERE orders.id = s.order_id;

Result: Non-deterministic — arbitrary source row used when duplicates exist

Correct
correct
-- Use a subquery to explicitly pick one row:
UPDATE orders SET status = s.status
FROM (
  SELECT DISTINCT ON (order_id) order_id, status
  FROM order_updates
  ORDER BY order_id, updated_at DESC
) s
WHERE orders.id = s.order_id;

Result: Deterministic — uses latest update per order

Always ensure the FROM clause in UPDATE produces at most one row per target row. Use DISTINCT ON, a subquery with GROUP BY, or add additional join conditions to enforce uniqueness. Enable log_min_messages = NOTICE and look for pg-update-ambiguous warnings in development.

This differs from many regex flavors where replace-all is the default. PostgreSQL's regexp_replace mirrors POSIX sed where a single replacement is the default and /g enables global replacement.

Wrong
incorrect
-- Trying to strip all digits:
SELECT regexp_replace('a1b2c3', '[0-9]', '');

Result: ab2c3 — only first digit removed

Correct
correct
SELECT regexp_replace('a1b2c3', '[0-9]', '', 'g');

Result: abc — all digits removed

The 'g' flag enables global replacement. Combine with 'i' for case-insensitive global replacement: regexp_replace(col, 'pattern', 'replacement', 'gi'). Remember: flags go in the fourth argument, not embedded in the pattern.

Found a gotcha that's not listed? Submit it on GitHub — community contributions welcome.