PostgreSQL Gotchas That Will Waste Your Afternoon

Developer & creator of pgref.dev · 10 min read

gotchasnulldebugging

Every one of these has cost me at least an hour. Most cost more. The frustrating thing is that PostgreSQL is behaving exactly as documented — I just hadn't read carefully enough.

NULL = NULL is not TRUE

-- This returns NO rows, even for rows where category IS NULL
SELECT * FROM products WHERE category = NULL;

-- This is correct
SELECT * FROM products WHERE category IS NULL;

NULL doesn't equal anything, including itself. NULL = NULL evaluates to NULL, not TRUE. So any WHERE clause using = NULL silently matches nothing.

This causes bugs in JOIN conditions too. If you're joining on a nullable column and some values are NULL, rows with NULL keys won't match even if both sides have NULL. Use IS NOT DISTINCT FROM if you need NULL-safe equality:

SELECT * FROM a JOIN b ON a.code IS NOT DISTINCT FROM b.code;

COUNT(col) skips NULLs, COUNT(*) does not

SELECT
  COUNT(*) AS total_rows,
  COUNT(email) AS rows_with_email
FROM users;

COUNT(*) counts all rows. COUNT(email) counts rows where email IS NOT NULL. These will differ if any email is NULL. The classic mistake: COUNT(DISTINCT col) where col has NULLs — the NULLs are silently excluded from the distinct count.

Integer division truncates

SELECT 5 / 2;      -- Returns 2, not 2.5
SELECT 5.0 / 2;    -- Returns 2.5
SELECT 5::float / 2; -- Returns 2.5

Integer divided by integer gives integer with truncation toward zero. This bites developers calculating percentages:

-- Wrong: returns 0 for any count less than total
SELECT count(*) / total * 100 AS pct FROM ...

-- Correct: cast before dividing
SELECT count(*)::numeric / total * 100 AS pct FROM ...

I've seen this show up as "0% conversion rate" in a dashboard for three days before anyone noticed.

regexp_replace without 'g' replaces only the first match

-- Only removes the first space
SELECT regexp_replace('hello world foo', ' ', '_');
-- Returns: 'hello_world foo'

-- Removes all spaces
SELECT regexp_replace('hello world foo', ' ', '_', 'g');
-- Returns: 'hello_world_foo'

Other regex implementations default to global replacement. PostgreSQL doesn't. Always add 'g' unless you explicitly want to replace only the first match.

BETWEEN is inclusive on both ends

BETWEEN a AND b is equivalent to >= a AND <= b. Both endpoints included. For timestamps this causes double-counting:

-- Any event at exactly 2024-02-01 00:00:00 appears in BOTH queries
SELECT * FROM events WHERE created_at BETWEEN '2024-01-01' AND '2024-02-01';
SELECT * FROM events WHERE created_at BETWEEN '2024-02-01' AND '2024-03-01';

For timestamp ranges, use explicit comparison:

SELECT * FROM events
WHERE created_at >= '2024-01-01'
  AND created_at < '2024-02-01';

At this point you might think I'm bad at reading docs. You'd be partly right. But I've watched senior engineers with years of database experience hit every single one of these — usually because they've been working in a different database system and assume SQL is SQL.

Arrays are 1-indexed

SELECT ARRAY[10, 20, 30][1];  -- Returns 10 (not 20)
SELECT ARRAY[10, 20, 30][0];  -- Returns NULL (no error, just NULL)

PostgreSQL arrays start at index 1. The silence of [0] returning NULL rather than an error makes this particularly annoying to debug — you get NULLs propagating through your query and have to trace back to find the off-by-one.

ORDER BY in a subquery is meaningless

-- The ORDER BY here has no guaranteed effect on the outer result
SELECT * FROM (
  SELECT * FROM events ORDER BY created_at DESC
) sub
LIMIT 10;

-- This is correct
SELECT * FROM events ORDER BY created_at DESC LIMIT 10;

PostgreSQL's query planner is allowed to ignore subquery ORDER BY clauses when they don't affect the final result. Sometimes it happens to preserve the order; sometimes it doesn't. Intermittent bug, hard to reproduce.

CHAR(n) pads with spaces

CREATE TABLE test (code CHAR(5));
INSERT INTO test VALUES ('abc');

SELECT length(code) FROM test;             -- Returns 5, not 3
SELECT code = 'abc  ' FROM test;           -- Returns true

CHAR(n) pads stored values with spaces to reach the declared length. Those spaces affect length() calculations and appear in concatenation and exports. Unless you're interfacing with a legacy system that requires fixed-width fields, use VARCHAR or TEXT.

The timestamp with/without timezone naming

The type names are confusing:

TIMESTAMP WITHOUT TIME ZONE — stores a local datetime with no timezone information
TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ) — stores UTC, converts to session timezone on display

If you insert a value with a timezone into a TIMESTAMP (without timezone) column, PostgreSQL converts it to the session's timezone and strips the info. The stored value depends on what timezone your session was in at insert time. This causes data inconsistencies when app servers run in different timezones.

Use TIMESTAMPTZ for everything unless you have a specific reason not to.

Bookmark the pgref.dev gotchas page. It's faster than debugging.