📈

PostgreSQL Statistics Views & Monitoring Functions

Complete reference for PostgreSQL statistics views and monitoring functions including pg_stat_activity, pg_stat_statements, pg_locks, pg_stat_user_tables, and wait event analysis. Essential reference for query performance tuning and database health monitoring. Updated for PostgreSQL 16.

17 functions

Jump to function

pg_blocking_pids pg_locks pg_relation_size / pg_total_relation_size pg_stat_activity pg_stat_bgwriter pg_stat_database pg_stat_io pg_stat_progress_create_index pg_stat_progress_vacuum pg_stat_replication pg_stat_statements pg_stat_user_indexes pg_stat_user_tables pg_stat_wal pg_stat_wal_receiver pg_statio_user_indexes pg_statio_user_tables

What are PostgreSQL Statistics Views & Monitoring Functions?

PostgreSQL statistics views are system views populated by the cumulative statistics collector and the query executor, providing both real-time and historical data about query execution, table and index access patterns, and lock contention. pg_stat_activity shows currently executing queries and their states, pg_stat_statements (extension) aggregates query performance across executions, and pg_locks reveals blocking lock chains. These views are the primary tools for diagnosing slow queries and connection saturation in production PostgreSQL deployments.

pg_blocking_pids

PG 9.6+→ integer[]

Returns an array of PIDs that are blocking the given server process from acquiring a lock. Returns an empty array if the process is not blocked.

DBAReliability

Signature

pg_blocking_pids ( pid integer ) → integer[]

Parameters

Parameter	Type	Description
pid	integer	PID of the process to check for blockers

Examples

sql

SELECT pid, pg_blocking_pids(pid) AS blocked_by FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;

→All currently blocked processes and their blockers

sql

SELECT a.pid, a.usename, a.query, a.state, pg_blocking_pids(a.pid) AS blocked_by FROM pg_stat_activity a WHERE cardinality(pg_blocking_pids(a.pid)) > 0;

→Blocked sessions with their blocking PIDs and query text

sql

SELECT count(*) AS blocked_sessions FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;

→Number of currently blocked sessions

sql

SELECT pid, usename, query, pg_blocking_pids(pid) AS blocked_by FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;

→All blocked sessions — investigate when count is persistently non-zero

⚠Anti-Pattern— Calling pg_blocking_pids on every row in a tight monitoring loop

Calling pg_blocking_pids on every row in pg_stat_activity in a tight monitoring loop — the function inspects lock structures for each call; querying it for hundreds of connections repeatedly can itself impose overhead; sample at a reasonable interval. Also: Assuming the returned PIDs are the root cause — pg_blocking_pids returns the immediate blocker; that blocker may itself be blocked by another process. Follow the chain recursively or use a CTE to find the root blocker. Also: Using pg_cancel_backend on a pid from pg_blocking_pids without checking if it is idle-in-transaction — cancelling an actively-executing query is safe; cancelling an idle-in-transaction session leaves the transaction open and may not release locks immediately. Also: Treating an empty array return as proof that no blocking exists — advisory locks and certain lock types (e.g., predicate locks) are not covered by pg_blocking_pids; use pg_locks for a complete picture.

✓ Instead: See description and related functions for the correct approach.

`pg_blocking_pids(pid)` always returns an array (possibly empty). Filter with `cardinality(pg_blocking_pids(pid)) > 0` to find only processes that are genuinely blocked, then optionally call `pg_cancel_backend` on the blocking pid.

example

SELECT pid, usename, query, pg_blocking_pids(pid) AS blocked_by FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0;

→Blocked sessions with which PIDs are blocking them

pg_locks

PG 7.4+→ setof record

View showing all currently held and awaited locks in the database cluster. Each row represents one lock or lock request.

DBAReliabilitySecurity

Signature

SELECT * FROM pg_locks

Examples

sql

SELECT pid, locktype, relation::regclass, mode, granted FROM pg_locks WHERE NOT granted;

→Locks that are currently being waited for

sql

SELECT l.pid, l.mode, l.relation::regclass, a.query FROM pg_locks l JOIN pg_stat_activity a USING (pid) WHERE NOT l.granted;

→Blocked pids with their waiting queries

sql

SELECT l.pid, a.usename, l.mode, l.relation::regclass, a.state FROM pg_locks l JOIN pg_stat_activity a USING (pid) WHERE l.mode = 'AccessExclusiveLock';

→Processes holding AccessExclusiveLock — blocks all other readers and writers

sql

SELECT count(*) AS total_locks, count(*) FILTER (WHERE NOT granted) AS waiting FROM pg_locks;

→Total locks vs waiting locks — ratio indicates contention level

⚠Anti-Pattern— Using pg_locks without joining to pg_stat_activity to identify the blocking query

Using pg_locks without joining to pg_stat_activity to identify the blocking query — lock rows contain pid but no query text; always join ON pid to pg_stat_activity to see what SQL holds the lock. Also: Filtering only WHERE NOT granted to find problems — granted = true rows show who holds the lock that is causing the wait; you need both sides of the join to diagnose blocking. Also: Casting relation to regclass without guarding against NULL — pg_locks rows for non-relation lock types (e.g., transactionid, virtualxid) have a NULL relation; unguarded ::regclass casts error out on NULL. Also: Reading pg_locks in isolation without pg_blocking_pids() — for multi-level lock chains (A blocks B blocks C), manually tracing through pg_locks is error-prone; pg_blocking_pids handles the full chain. Also: Alarming on any AccessExclusiveLock without checking duration — short-lived AELocks during routine DDL are normal; correlate with now() - state_change from pg_stat_activity to detect only long-held exclusive locks.

✓ Instead: See description and related functions for the correct approach.

Lock rows don't include the query text. Join on `pid` to see which query holds an AccessExclusiveLock: `SELECT a.query FROM pg_locks l JOIN pg_stat_activity a USING (pid) WHERE l.mode = 'AccessExclusiveLock'`.

example

SELECT blocking.pid, blocking.query, blocked.pid AS blocked_pid, blocked.query AS blocked_query FROM pg_stat_activity blocked JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));

→Full blocker-to-blocked chain with query text

Common Gotchas

⚠

lock_timeout vs statement_timeout — different problems, often confused

statement_timeout cancels a query that runs too long. lock_timeout cancels a query that waits too long to acquire a lock. You almost always want both set together.

pg_relation_size / pg_total_relation_size

PG 8.1+ (pg_relation_size fork parameter added PG 9.0; pg_total_relation_size PG 8.1+)→ bigint

Returns the on-disk size in bytes of a relation. `pg_relation_size` returns the main fork only; `pg_total_relation_size` includes indexes, TOAST, and free space maps.

DBAReliability

Signatures

pg_relation_size ( relation regclass [, fork text ] ) → bigint

pg_total_relation_size ( relation regclass ) → bigint

Parameters

Parameter	Type	Description
relation	regclass	Table, index, or other relation name or OID

Examples

sql

SELECT relname, pg_size_pretty(pg_total_relation_size(relid)) AS total_size FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;

→Top 10 largest tables including indexes

sql

SELECT indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes ORDER BY pg_relation_size(indexrelid) DESC LIMIT 10;

→Largest indexes

sql

SELECT relname, pg_size_pretty(pg_relation_size(relid)) AS heap_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_overhead FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;

→Heap size vs index overhead per table — large gap means heavy indexing

sql

SELECT pg_size_pretty(sum(pg_total_relation_size(relid))) AS total_user_data FROM pg_stat_user_tables;

→Total disk space used by all user tables and their indexes

⚠Anti-Pattern— Using pg_relation_size when total storage cost is needed

Using pg_relation_size when you actually need the full storage cost — pg_relation_size returns only the main heap fork; for total storage including indexes and TOAST, use pg_total_relation_size to avoid dramatically understating table size. Also: Running pg_total_relation_size on every table in a tight loop without caching — each call acquires a brief lock to read the file system; looping over thousands of tables rapidly can cause lock contention; batch with a single SELECT over pg_stat_user_tables instead. Also: Comparing pg_relation_size output directly to OS-reported file sizes — the relation size includes only allocated pages, not necessarily the same bytes the OS reports due to filesystem block alignment and pre-allocated extents. Also: Not using pg_size_pretty for human output — pg_relation_size returns raw bytes; always wrap with pg_size_pretty() in reports to avoid misreading large numbers (e.g., confusing megabytes with gigabytes). Also: Forgetting that pg_relation_size does not account for table bloat — a table that has been heavily updated may show a large size even though most blocks contain dead tuples; cross-reference with pg_stat_user_tables.n_dead_tup before concluding the data itself is large.

✓ Instead: See description and related functions for the correct approach.

The difference between `pg_total_relation_size` and `pg_relation_size` is the combined size of all indexes plus TOAST. A very large gap indicates heavy indexing overhead — reconsider whether all indexes are needed.

example

SELECT relname, pg_size_pretty(pg_relation_size(relid)) AS table_only, pg_size_pretty(pg_total_relation_size(relid)) AS with_indexes FROM pg_stat_user_tables ORDER BY pg_total_relation_size(relid) DESC LIMIT 10;

→Table size vs total size (diff = indexes + TOAST)

pg_stat_activity

PG 8.0+ (enhanced in each major version: wait_event/wait_event_type added PG 9.6, backend_type PG 10, query_id PG 14)→ setof record

System view showing one row per server process, with information about the current activity: database, user, query text, wait events, state, and timing.

DBAReliabilitySecurity

Signature

SELECT * FROM pg_stat_activity

Examples

sql

SELECT pid, usename, state, left(query, 80) FROM pg_stat_activity WHERE state != 'idle';

→All active (non-idle) connections and their queries

sql

SELECT pid, now() - query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' ORDER BY duration DESC;

→Longest-running queries

sql

SELECT wait_event_type, wait_event, count(*) FROM pg_stat_activity WHERE wait_event IS NOT NULL GROUP BY 1, 2 ORDER BY 3 DESC;

→Current wait event distribution

sql

SELECT count(*) AS idle_in_tx FROM pg_stat_activity WHERE state = 'idle in transaction' AND state_change < now() - interval '5 minutes';

→Count of idle-in-transaction sessions older than 5 minutes — a lock-retention risk

⚠Anti-Pattern— Reading pg_stat_activity without filtering idle connections

Reading pg_stat_activity without filtering out idle connections — idle rows dominate the result set and obscure active workload; always add WHERE state != 'idle'. Also: Reading wait_event without understanding wait_event_type context — a wait_event of 'Lock' means nothing without knowing wait_event_type is 'Lock'; always select both columns together. Also: Using pg_stat_activity to check for blocking without pg_blocking_pids() — manually comparing pids and relation locks misses soft dependencies; use pg_blocking_pids(pid) instead. Also: Granting broad SELECT on pg_stat_activity to application roles — the view exposes query text, usernames, and client addresses for all sessions; restrict access and use pg_monitor role for monitoring accounts. Also: Trusting query_start for duration without checking state — query_start is only meaningful when state = 'active'; for idle-in-transaction rows, use state_change instead.

✓ Instead: See description and related functions for the correct approach.

Combine `pg_stat_activity` with `pg_blocking_pids(pid)` to find what is blocking a waiting query: `SELECT pid, pg_blocking_pids(pid) AS blocked_by, query FROM pg_stat_activity WHERE cardinality(pg_blocking_pids(pid)) > 0`.

example

SELECT a.pid, a.query, b.pid AS blocking_pid, b.query AS blocking_query FROM pg_stat_activity a JOIN pg_stat_activity b ON b.pid = ANY(pg_blocking_pids(a.pid));

→Blocked query paired with the query blocking it

Common Gotchas

⚠

connection limits: each connection costs ~5–10 MB of RAM regardless of activity

PostgreSQL spawns a backend process per connection. 500 idle connections consume ~2.5–5 GB of RAM and slow down the server.

pg_stat_bgwriter

PG 8.3+ (buffers_backend_fsync added PG 9.1; split into pg_stat_bgwriter + pg_stat_checkpointer in PG 17)→ record (single row)

Single-row view with cumulative statistics for the background writer and checkpointer: checkpoint counts, buffers written, buffers allocated.

DBAReliability

Signature

SELECT * FROM pg_stat_bgwriter

Examples

sql

SELECT checkpoints_timed, checkpoints_req, round(100.0 * checkpoints_req / nullif(checkpoints_timed + checkpoints_req, 0), 1) AS forced_pct FROM pg_stat_bgwriter;

→Fraction of checkpoints that were forced (high = tune checkpoint settings)

sql

SELECT buffers_checkpoint, buffers_clean, buffers_backend, buffers_alloc FROM pg_stat_bgwriter;

→Buffer write breakdown by source

sql

SELECT buffers_checkpoint + buffers_clean + buffers_backend AS total_buffers_written, round(100.0 * buffers_backend / nullif(buffers_checkpoint + buffers_clean + buffers_backend, 0), 1) AS backend_write_pct FROM pg_stat_bgwriter;

→Fraction of buffer writes done by backends instead of bgwriter — above 20% is concerning

sql

SELECT checkpoints_timed, checkpoints_req, round(100.0 * checkpoints_req / nullif(checkpoints_timed + checkpoints_req, 0), 1) AS forced_pct FROM pg_stat_bgwriter;

→Fraction of checkpoints that were forced — above 10% means max_wal_size is too small

⚠Anti-Pattern— Reading pg_stat_bgwriter once and drawing conclusions from a snapshot

Reading pg_stat_bgwriter once and drawing conclusions — the view is cumulative since last reset; compute rates by sampling twice and dividing the delta by elapsed time, not by reading a snapshot. Also: Ignoring buffers_backend as a signal — when backends write buffers themselves (buffers_backend is high relative to buffers_checkpoint), the bgwriter is not cleaning pages fast enough; increase bgwriter_lru_maxpages or bgwriter_delay. Also: Treating checkpoints_req = 0 as healthy without checking checkpoint frequency — in a low-write workload checkpoints_req can be zero while checkpoints_timed fires too frequently; ensure checkpoint_timeout matches your recovery-time objective. Also: Not resetting stats with pg_stat_reset() before a benchmark — historical checkpoint data from normal operations masks the checkpoint behavior under load; reset before load tests for a clean picture. Also: Confusing buffers_clean (bgwriter) with buffers_checkpoint (checkpointer) — they represent different write paths; a high buffers_checkpoint/buffers_alloc ratio means I/O is checkpoint-dominated, not bgwriter-dominated.

✓ Instead: See description and related functions for the correct approach.

If `checkpoints_req / (checkpoints_timed + checkpoints_req)` is above 10-20%, checkpoints are happening too frequently. Increase `max_wal_size` and `checkpoint_completion_target` to spread I/O more evenly.

example

SELECT checkpoints_req, checkpoints_timed, buffers_backend / nullif(buffers_checkpoint, 0) AS backend_write_ratio FROM pg_stat_bgwriter;

→High backend_write_ratio means the bgwriter isn't cleaning pages fast enough

pg_stat_database

PG 7.4+ (temp_files/temp_bytes added PG 9.2; checksum_failures added PG 12; session_time/active_time added PG 14)→ setof record

View with per-database statistics: connections, commits, rollbacks, cache hits, rows read/fetched, deadlocks, temp file usage.

DBAReliability

Signature

SELECT * FROM pg_stat_database

Examples

sql

SELECT datname, round(100.0 * blks_hit / nullif(blks_hit + blks_read, 0), 1) AS cache_hit_pct FROM pg_stat_database WHERE datname = current_database();

→Buffer cache hit ratio — should be above 95%

sql

SELECT datname, deadlocks, conflicts, temp_files, pg_size_pretty(temp_bytes) FROM pg_stat_database ORDER BY deadlocks DESC;

→Databases with deadlocks, conflicts, and temp file usage

sql

SELECT datname, deadlocks, conflicts, temp_files, pg_size_pretty(temp_bytes) FROM pg_stat_database ORDER BY deadlocks DESC;

→Databases with deadlocks, conflicts, and temp file spills

sql

SELECT datname, xact_commit, xact_rollback, round(100.0 * xact_rollback / nullif(xact_commit + xact_rollback, 0), 2) AS rollback_pct FROM pg_stat_database WHERE datname = current_database();

→Rollback ratio — high percentage indicates frequent errors or serialization failures

⚠Anti-Pattern— Computing cache hit ratio across all databases including template databases

Computing cache hit ratio with blks_hit / (blks_hit + blks_read) across all databases including template0/template1 — these have near-zero reads and inflate the apparent hit ratio; always filter WHERE datname NOT LIKE 'template%'. Also: Treating a cache hit ratio above 99% as proof that shared_buffers is large enough — a ratio near 100% can also indicate a tiny working set that fits easily; cross-reference with blks_read growth rate over time. Also: Ignoring xact_rollback relative to xact_commit — a high rollback ratio indicates application-level errors or frequent serialization failures; it warrants investigation beyond just I/O stats. Also: Reading temp_files without correlating with work_mem — a large temp_files count means sort/hash operations are spilling to disk; the fix is usually increasing work_mem per session rather than shared_buffers. Also: Not monitoring deadlocks column over time — pg_stat_database.deadlocks is cumulative; set up a time-series metric and alert on rate increases rather than polling the absolute value.

✓ Instead: See description and related functions for the correct approach.

A cache hit ratio below 95% suggests your working set doesn't fit in `shared_buffers`. Increasing `shared_buffers` (typically to 25% of RAM) can dramatically reduce disk I/O.

example

SELECT datname, blks_hit, blks_read, round(100.0 * blks_hit / nullif(blks_hit + blks_read, 0), 2) AS hit_ratio FROM pg_stat_database WHERE datname NOT LIKE 'template%';

→Cache hit ratio per database — below 95% is a concern

Common Gotchas

⚠

connection limits: each connection costs ~5–10 MB of RAM regardless of activity

PostgreSQL spawns a backend process per connection. 500 idle connections consume ~2.5–5 GB of RAM and slow down the server.

pg_stat_io

PG 16+→ setof record

View (added in PG 16) tracking I/O operations per backend type and I/O object: reads, writes, hits, evictions, reuses, and timing.

DBAReliability

Signature

SELECT * FROM pg_stat_io

Examples

sql

SELECT backend_type, object, reads, writes, hits FROM pg_stat_io WHERE reads > 0 ORDER BY reads DESC;

→I/O breakdown by backend type and object type

sql

SELECT object, reads, hits, round(100.0 * hits / nullif(reads + hits, 0), 1) AS hit_pct FROM pg_stat_io WHERE context = 'normal' ORDER BY reads DESC;

→Cache hit ratio per object type

sql

SELECT backend_type, sum(read_time) AS total_read_ms, sum(write_time) AS total_write_ms FROM pg_stat_io GROUP BY backend_type ORDER BY total_read_ms DESC;

→I/O time per backend type — identifies whether client or autovacuum dominates

sql

SELECT object, reads, hits, round(100.0 * hits / nullif(reads + hits, 0), 1) AS hit_pct FROM pg_stat_io WHERE context = 'normal' ORDER BY reads DESC;

→Cache hit ratio per I/O object type

⚠Anti-Pattern— Querying pg_stat_io on PostgreSQL versions before 16

Querying pg_stat_io on PostgreSQL versions before 16 — the view does not exist; check server_version_num >= 160000 before including it in monitoring scripts. Also: Summing reads across all backend_type and context combinations without grouping — autovacuum, bgwriter, and client backends have very different I/O profiles; aggregate within each backend_type to avoid misleading totals. Also: Ignoring evictions alongside reads — evictions show pages ejected from shared_buffers to make room for new pages; high evictions alongside high reads signals a shared_buffers working set that is too small. Also: Reading read_time and write_time without verifying track_io_timing is enabled — timing columns are populated only when track_io_timing = on; without it, both columns are always zero, making I/O timing invisible. Also: Treating pg_stat_io hits as redundant with pg_stat_database cache hit ratio — pg_stat_io splits hits by backend type and object (heap vs index vs temp); this granularity reveals, for example, whether autovacuum is consuming cache that client queries need.

✓ Instead: See description and related functions for the correct approach.

Before PG 16, I/O analysis required external tools. Now `pg_stat_io` shows exactly which backend types (client backend, autovacuum, bgwriter) are doing reads and writes, making it easy to attribute I/O load.

example

SELECT backend_type, sum(read_time) AS total_read_ms, sum(write_time) AS total_write_ms FROM pg_stat_io GROUP BY backend_type ORDER BY total_read_ms DESC;

→Time spent on I/O per backend type

Common Gotchas

⚠

LIKE is case-sensitive; ILIKE is not — and LIKE is faster

LIKE 'hello%' will not match 'Hello'. Use ILIKE for case-insensitive pattern matching, but expect a performance cost.

⚠

COALESCE evaluates ALL arguments — not short-circuit like IF

COALESCE evaluates every argument before returning the first non-NULL. This means expensive functions or side-effectful calls on later arguments ARE executed.

⚠

Implicit type casts in indexes — your index may not be used

WHERE col = 5 may not use an index on col (text type) because the integer 5 is cast to text, preventing index use.

pg_stat_progress_create_index

PG 12+→ setof record

Shows progress of in-progress CREATE INDEX and REINDEX operations: current phase, blocks scanned, tuples processed.

DBAReliability

Signature

SELECT * FROM pg_stat_progress_create_index

Examples

sql

SELECT relid::regclass AS table, phase, blocks_done, blocks_total, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS pct FROM pg_stat_progress_create_index;

→Percentage complete for active index builds

sql

SELECT relid::regclass AS table, phase, blocks_done || '/' || blocks_total AS progress, tuples_done FROM pg_stat_progress_create_index;

→Index build phases and block-level progress

sql

SELECT relid::regclass, now() - a.query_start AS running_for, phase FROM pg_stat_progress_create_index p JOIN pg_stat_activity a ON p.pid = a.pid;

→How long each index build has been running

sql

SELECT count(*) FROM pg_stat_progress_create_index WHERE phase LIKE '%waiting%';

→Index builds stalled waiting for old snapshots or locks

⚠Anti-Pattern— Treating blocks_done/blocks_total as a reliable ETA for concurrent index builds

Treating blocks_done / blocks_total as a reliable ETA for a concurrent index build — REINDEX CONCURRENTLY performs multiple heap scans; blocks_done resets between phases, so a progress percentage near 100% may restart. Also: Cancelling an index build stuck in 'waiting for old snapshots' — this phase waits for long-running transactions to complete; cancelling the index build does not fix the root cause. Identify and terminate the offending long-running transaction instead. Also: Not monitoring pg_stat_progress_create_index during production CREATE INDEX CONCURRENTLY — a stalled concurrent build holds a ShareUpdateExclusiveLock on the table, which blocks DDL but not DML; it is easy to forget a background index build is running. Also: Reading index_relid::regclass during the initializing phase — the index OID may not yet be visible in the catalog at the very start of the build, causing a regclass cast error; guard with a phase filter or use index_relid directly. Also: Ignoring tuples_done relative to pg_stat_user_tables.n_live_tup — if tuples_done stalls far below the expected live tuple count, the index build may be stuck waiting for visibility or encountering errors; cross-check with pg_stat_activity for error messages.

✓ Instead: See description and related functions for the correct approach.

A concurrent index build goes through multiple phases (initializing, scanning heap, writing). The 'waiting for old snapshots' phase can stall on long-running transactions. Monitor phase transitions to identify bottlenecks.

example

SELECT relid::regclass, phase, blocks_done || '/' || blocks_total AS block_progress, tuples_done FROM pg_stat_progress_create_index;

→Live index build progress per phase

Common Gotchas

⚠

LIKE is case-sensitive; ILIKE is not — and LIKE is faster

LIKE 'hello%' will not match 'Hello'. Use ILIKE for case-insensitive pattern matching, but expect a performance cost.

⚠

Arrays are 1-indexed in PostgreSQL, not 0-indexed

PostgreSQL arrays start at index 1 by default. array[0] returns NULL, not the first element — silently wrong.

⚠

Implicit type casts in indexes — your index may not be used

WHERE col = 5 may not use an index on col (text type) because the integer 5 is cast to text, preventing index use.

pg_stat_progress_vacuum

PG 9.6+→ setof record

Shows progress of in-progress VACUUM operations: current phase, heap blocks scanned/vacuumed, dead tuples collected, and index vacuum passes.

DBAReliability

Signature

SELECT * FROM pg_stat_progress_vacuum

Examples

sql

SELECT relid::regclass AS table, phase, heap_blks_scanned, heap_blks_total, round(100.0 * heap_blks_scanned / nullif(heap_blks_total, 0), 1) AS pct_done FROM pg_stat_progress_vacuum;

→Current VACUUM progress and completion percentage

sql

SELECT relid::regclass, phase, heap_blks_vacuumed, heap_blks_total, round(100.0 * heap_blks_vacuumed / nullif(heap_blks_total, 0), 1) AS pct FROM pg_stat_progress_vacuum;

→Vacuum progress percentage and current phase for all in-progress VACUUMs

sql

SELECT pv.relid::regclass, pv.phase, now() - a.query_start AS running_for FROM pg_stat_progress_vacuum pv JOIN pg_stat_activity a ON pv.pid = a.pid;

→How long each VACUUM has been running — join to pg_stat_activity for timing

sql

SELECT count(*) AS active_vacuums FROM pg_stat_progress_vacuum;

→Number of tables currently being vacuumed (manual + autovacuum)

⚠Anti-Pattern— Seeing no rows in pg_stat_progress_vacuum and concluding vacuum is not running

Polling pg_stat_progress_vacuum and seeing no rows and concluding vacuum is not running — autovacuum workers also appear in this view only while actively vacuuming; a just-finished or about-to-start worker produces no row. Also: Using heap_blks_scanned / heap_blks_total as the only completion estimate — a vacuum with many dead tuples may linger in the 'vacuuming indexes' phase long after finishing the heap scan; track phase transitions, not just block progress. Also: Cancelling a VACUUM because it appears stuck in 'vacuuming indexes' on a large table — index vacuuming on a GIN or multi-column index can legitimately take many minutes; killing it restarts from the beginning on the next run. Also: Not checking pid against pg_stat_activity — pg_stat_progress_vacuum shows the backend pid; joining to pg_stat_activity confirms whether it is an autovacuum worker or a manual VACUUM and shows how long it has been running. Also: Ignoring num_dead_item_ids accumulating without heap_blks_vacuumed advancing — this indicates the vacuum is collecting dead item IDs but the maintenance_work_mem is full before it can do an index pass; consider increasing maintenance_work_mem for large tables.

✓ Instead: See description and related functions for the correct approach.

Track `heap_blks_scanned / heap_blks_total` over time to estimate when a long-running VACUUM will finish. You can also check `num_dead_item_ids` — a very high number might indicate index vacuuming will take a while.

example

SELECT relid::regclass, phase, heap_blks_scanned || '/' || heap_blks_total AS progress, num_dead_item_ids FROM pg_stat_progress_vacuum;

→Real-time VACUUM progress monitoring

pg_stat_replication

PG 9.1+ (write_lag/flush_lag/replay_lag interval columns added PG 10; spill_* and send_buffer_size added PG 14)→ setof record

View showing one row per WAL sender process (standby connections). Shows replication state, flush/write/replay LSN positions, and replication lag.

DBAReliability

Signature

SELECT * FROM pg_stat_replication

Examples

sql

SELECT client_addr, state, write_lag, flush_lag, replay_lag FROM pg_stat_replication;

→Replication lag per standby

sql

SELECT client_addr, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS bytes_behind FROM pg_stat_replication;

→Bytes of WAL each standby is behind

sql

SELECT client_addr, pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS bytes_behind FROM pg_stat_replication;

→Bytes each standby is behind the primary

sql

SELECT count(*) AS standby_count FROM pg_stat_replication WHERE state = 'streaming';

→Number of standbys currently streaming — zero means no active replicas

⚠Anti-Pattern— Relying solely on replay_lag interval without checking byte lag

Relying solely on replay_lag (an interval) without also checking byte lag — during low-write periods, interval lag can grow large even though the standby is fully caught up in bytes; always check pg_wal_lsn_diff(sent_lsn, replay_lsn) alongside the interval. Also: Not alerting when pg_stat_replication has zero rows — if all standbys disconnect, the view becomes empty; monitoring that polls it must treat an empty result as a critical alert, not just 'no lag'. Also: Confusing sent_lsn with flush_lsn for durability guarantees — data is durable on the standby only after flush_lsn; synchronous_standby_names must reference flush position for true synchronous replication guarantees. Also: Ignoring the state column — a standby in 'catchup' state is replaying archived WAL, not streaming; it may be significantly behind even if replay_lag looks small because it is processing older segments. Also: Using pg_stat_replication on the standby — the view is populated only on the primary; on the standby, use pg_stat_wal_receiver instead.

✓ Instead: See description and related functions for the correct approach.

`replay_lag` (an interval) is the time-based lag. For bytes-behind, use `pg_wal_lsn_diff(sent_lsn, replay_lsn)`. During low-activity periods, time-based lag can be misleadingly high even though the standby is fully current in bytes.

example

SELECT client_addr, replay_lag, pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS bytes_behind FROM pg_stat_replication;

→Combined lag picture: time and data volume

pg_stat_statements

PG 9.0+ (requires shared_preload_libraries = 'pg_stat_statements' and CREATE EXTENSION pg_stat_statements; columns added in later versions: wal_* in PG 13, toplevel in PG 14)→ setof record

Extension view that tracks execution statistics for all SQL statements. Requires `pg_stat_statements` to be loaded as a shared preload library.

DBAReliability

Signature

SELECT * FROM pg_stat_statements

Examples

sql

SELECT query, calls, round(mean_exec_time::numeric, 2) AS avg_ms, rows FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

→Top 10 slowest queries by average execution time

sql

SELECT query, calls, round(total_exec_time::numeric/1000, 1) AS total_sec FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10;

→Top 10 queries by total time consumed

sql

SELECT query, shared_blks_hit, shared_blks_read, round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 1) AS hit_pct FROM pg_stat_statements ORDER BY shared_blks_read DESC LIMIT 10;

→Queries with most physical reads (cache-miss candidates)

sql

SELECT query, shared_blks_hit, shared_blks_read, round(100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0), 1) AS hit_pct FROM pg_stat_statements ORDER BY shared_blks_read DESC LIMIT 10;

→Queries with most physical reads (cache-miss candidates)

⚠Anti-Pattern— Querying pg_stat_statements without the extension enabled

Querying pg_stat_statements without the extension enabled — the view simply won't exist; verify with 'SELECT * FROM pg_extension WHERE extname = ''pg_stat_statements'';' before relying on it. Also: Not resetting pg_stat_statements periodically — cumulative stats since server start dilute signal from recent changes; call SELECT pg_stat_statements_reset() before load tests or after major schema changes to get a clean window. Also: Sorting only by mean_exec_time and ignoring total_exec_time — a query called millions of times with a 1 ms average costs more overall than a rare 500 ms query; always consider both dimensions. Also: Treating pg_stat_statements query text as exact SQL — parameters are normalized into $1, $2 placeholders; use queryid to correlate, not substring matching on the query column. Also: Ignoring shared_blks_read in favor of timing alone — a fast query that hammers disk reads will degrade under load; cross-check mean_exec_time with shared_blks_read / calls to find cache-cold queries.

✓ Instead: See description and related functions for the correct approach.

Call `SELECT pg_stat_statements_reset()` before a load test to get clean statistics for just that window. Otherwise historical data dilutes the signal from recent changes.

example

SELECT pg_stat_statements_reset();
-- run your workload --
SELECT query, mean_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 5;

→Clean stats for just the recent workload

Common Gotchas

⚠

LIKE is case-sensitive; ILIKE is not — and LIKE is faster

LIKE 'hello%' will not match 'Hello'. Use ILIKE for case-insensitive pattern matching, but expect a performance cost.

⚠

COALESCE evaluates ALL arguments — not short-circuit like IF

COALESCE evaluates every argument before returning the first non-NULL. This means expensive functions or side-effectful calls on later arguments ARE executed.

⚠

Implicit type casts in indexes — your index may not be used

WHERE col = 5 may not use an index on col (text type) because the integer 5 is cast to text, preventing index use.

pg_stat_user_indexes

PG 8.2+→ setof record

View showing index usage statistics: number of scans, tuples read, and tuples fetched. Used to identify unused or underused indexes.

DBAReliability

Signature

SELECT * FROM pg_stat_user_indexes

Examples

sql

SELECT relname, indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY relname;

→Indexes that have never been used — candidates for removal

sql

SELECT relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size FROM pg_stat_user_indexes ORDER BY idx_scan ASC LIMIT 20;

→Unused indexes with their sizes — prioritize dropping large ones

sql

SELECT relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS wasted_size FROM pg_stat_user_indexes WHERE idx_scan = 0 ORDER BY pg_relation_size(indexrelid) DESC;

→Unused indexes sorted by space — largest first for removal priority

sql

SELECT count(*) AS unused_index_count, pg_size_pretty(sum(pg_relation_size(indexrelid))) AS wasted_space FROM pg_stat_user_indexes WHERE idx_scan = 0;

→Total count and wasted space from all unused indexes

⚠Anti-Pattern— Dropping an index solely because idx_scan is zero

Dropping an index solely because idx_scan = 0 without checking when the server was last restarted — pg_stat_user_indexes resets on restart; an index that was actively used before a recent restart will show zero scans. Also: Ignoring indexes that back unique constraints or primary keys when hunting unused indexes — these are enforced at write time, not via idx_scan; dropping them destroys the constraint. Also: Not distinguishing idx_tup_read from idx_tup_fetch — idx_tup_read counts index entries visited; idx_tup_fetch counts heap rows returned. A large gap suggests many dead tuples in the index or a visibility-map issue. Also: Dropping multiple indexes simultaneously on a production table — each DROP INDEX CONCURRENTLY is safe alone, but dropping several at once removes write-amplification protection before you can measure the impact; remove one at a time and monitor. Also: Evaluating index usage without cross-referencing pg_statio_user_indexes — an index may scan infrequently but serve critical queries; low idx_blks_read alongside low idx_scan often means the index fits in cache and is working well.

✓ Instead: See description and related functions for the correct approach.

An `idx_scan = 0` index might still be valuable if the database was recently restarted (stats reset on restart) or if it supports a constraint. Cross-check with `pg_indexes` and verify the index isn't a primary key or unique constraint before dropping.

example

SELECT i.relname AS table, i.indexrelname, s.idx_scan, pg_size_pretty(pg_relation_size(s.indexrelid)) FROM pg_stat_user_indexes s JOIN pg_indexes i ON i.indexname = s.indexrelname WHERE s.idx_scan < 10 ORDER BY pg_relation_size(s.indexrelid) DESC;

→Low-usage indexes sorted by wasted space

pg_stat_user_tables

PG 8.2+ (n_mod_since_analyze added PG 9.4; n_ins_since_vacuum added PG 13)→ setof record

View showing statistics for each user table: sequential scans, index scans, rows inserted/updated/deleted, dead tuples, last vacuum/analyze times.

DBAReliability

Signature

SELECT * FROM pg_stat_user_tables

Examples

sql

SELECT relname, seq_scan, idx_scan, CASE WHEN (seq_scan + idx_scan) > 0 THEN round(100.0 * idx_scan / (seq_scan + idx_scan), 1) ELSE 0 END AS idx_pct FROM pg_stat_user_tables ORDER BY seq_scan DESC;

→Tables with the most sequential scans — index candidates

sql

SELECT relname, n_dead_tup, n_live_tup, round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 1) AS dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY dead_pct DESC;

→Tables bloated with dead tuples — vacuum candidates

sql

SELECT relname, last_vacuum, last_autovacuum, last_analyze FROM pg_stat_user_tables ORDER BY last_autovacuum NULLS FIRST;

→Tables that haven't been vacuumed recently

sql

SELECT relname, last_autovacuum, n_dead_tup FROM pg_stat_user_tables WHERE n_dead_tup > 50000 ORDER BY n_dead_tup DESC;

→Tables with significant bloat that autovacuum hasn't resolved

⚠Anti-Pattern— Relying on n_live_tup for exact row counts

Relying on n_live_tup instead of COUNT(*) for exact row counts — n_live_tup is a planner estimate updated by vacuum/analyze and can be significantly off on heavily-modified tables; use COUNT(*) when an exact figure is needed. Also: Treating last_autovacuum as confirmation that autovacuum is healthy — autovacuum may have run but been killed by lock contention or cost-delay throttling before completing; check n_dead_tup after the timestamp to confirm progress. Also: Ignoring seq_scan on large tables — even occasional full scans on multi-million-row tables are serious; an idx_scan / (seq_scan + idx_scan) ratio below 95% for large tables warrants index investigation. Also: Not differentiating between last_vacuum (manual) and last_autovacuum (daemon) — a table may have last_vacuum set from a one-off run while autovacuum is silently excluded via autovacuum_enabled = false. Also: Comparing n_dead_tup across tables without considering table size — 10,000 dead tuples on a 100-row table is catastrophic; always compute dead_pct = n_dead_tup / nullif(n_live_tup + n_dead_tup, 0).

✓ Instead: See description and related functions for the correct approach.

A high ratio of dead to live tuples causes table bloat and slows queries. If `n_dead_tup / (n_live_tup + n_dead_tup) > 10%` and `last_autovacuum` is stale, autovacuum may not be keeping up — consider manual `VACUUM ANALYZE`.

example

SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE n_dead_tup > 100000 ORDER BY n_dead_tup DESC;

→Tables urgently needing vacuum

Common Gotchas

⚠

VACUUM FULL takes an exclusive lock — blocks all reads AND writes

VACUUM FULL rewrites the entire table and holds an AccessExclusiveLock for the duration. Production traffic is blocked.

⚠

pg_dump is consistent but still impacts production — it holds a snapshot

pg_dump takes a consistent snapshot, but that snapshot prevents autovacuum from removing dead tuples — causing table bloat during long backups.

pg_stat_wal

PG 14+→ record (single row)

Single-row view showing WAL generation statistics: WAL records written, full-page writes, bytes written, buffers written, and sync statistics.

DBAReliability

Signature

SELECT * FROM pg_stat_wal

Examples

sql

SELECT wal_records, pg_size_pretty(wal_bytes) AS wal_size, wal_buffers_full FROM pg_stat_wal;

→Total WAL generated since last reset

sql

SELECT wal_records, wal_fpw, round(100.0 * wal_fpw / nullif(wal_records, 0), 1) AS fpw_pct FROM pg_stat_wal;

→Full-page write fraction — high percentage means checkpoint interval is too short

sql

SELECT pg_size_pretty(wal_bytes / extract(epoch FROM now() - stats_reset)::bigint) AS wal_bytes_per_sec FROM pg_stat_wal;

→WAL generation rate — compare to wal_keep_size for replica lag risk

sql

SELECT wal_buffers_full FROM pg_stat_wal;

→Non-zero means wal_buffers is undersized and causing premature flushes

⚠Anti-Pattern— Reading wal_bytes as a snapshot without capturing a delta

Reading wal_bytes as a snapshot without capturing a delta — all values are cumulative since last reset; to compute a rate, record two readings separated by a known time interval and divide the difference. Also: Ignoring wal_fpw (full-page writes) relative to wal_records — a very high wal_fpw / wal_records ratio indicates checkpoint frequency is too high; full-page writes inflate WAL volume significantly and increasing checkpoint_completion_target can reduce them. Also: Not monitoring wal_buffers_full — when wal_buffers_full increments, WAL buffers are flushing prematurely because wal_buffers is undersized; this causes synchronous I/O that hurts write latency. Also: Querying pg_stat_wal on PostgreSQL versions before 14 — the view does not exist; check server_version_num >= 140000 before including it in monitoring queries. Also: Using pg_stat_wal without correlating with pg_stat_bgwriter checkpoint data — WAL volume must be interpreted alongside checkpoint frequency; high WAL bytes with few forced checkpoints suggests large transactions, while high bytes plus many forced checkpoints points to insufficient max_wal_size.

✓ Instead: See description and related functions for the correct approach.

Divide `wal_bytes` by time since `stats_reset` to get the WAL generation rate. Compare this to your WAL retention settings (`wal_keep_size`) to ensure replicas can keep up and WAL archives don't overflow.

example

SELECT pg_size_pretty(wal_bytes) AS total_wal, pg_size_pretty(wal_bytes / extract(epoch FROM now() - stats_reset)::bigint) AS bytes_per_sec FROM pg_stat_wal;

→Approximate WAL generation rate

Common Gotchas

⚠

LIKE is case-sensitive; ILIKE is not — and LIKE is faster

LIKE 'hello%' will not match 'Hello'. Use ILIKE for case-insensitive pattern matching, but expect a performance cost.

⚠

COALESCE evaluates ALL arguments — not short-circuit like IF

COALESCE evaluates every argument before returning the first non-NULL. This means expensive functions or side-effectful calls on later arguments ARE executed.

⚠

Implicit type casts in indexes — your index may not be used

WHERE col = 5 may not use an index on col (text type) because the integer 5 is cast to text, preventing index use.

pg_stat_wal_receiver

PG 9.6+→ record (single row)

Single-row view on a standby server showing the WAL receiver's status: connected primary, last received LSN, receive timing, and status.

DBAReliability

Signature

SELECT * FROM pg_stat_wal_receiver

Examples

sql

SELECT status, sender_host, received_lsn, last_msg_receipt_time FROM pg_stat_wal_receiver;

→WAL receiver connection status and last received position

sql

SELECT status, now() - last_msg_receipt_time AS silence_duration FROM pg_stat_wal_receiver;

→Time since last keepalive — long silence indicates a broken connection

sql

SELECT received_lsn, pg_last_wal_replay_lsn() AS replayed_lsn, pg_wal_lsn_diff(received_lsn, pg_last_wal_replay_lsn()) AS replay_lag_bytes FROM pg_stat_wal_receiver;

→WAL received but not yet replayed — replay process falling behind

sql

SELECT sender_host, sender_port, status FROM pg_stat_wal_receiver;

→Primary host details and current replication status on this standby

⚠Anti-Pattern— Querying pg_stat_wal_receiver on the primary instead of the standby

Querying pg_stat_wal_receiver on the primary — the view is populated only on standbys; it returns zero rows on a primary server, which can be mistaken for a replication problem. Also: Treating a stale last_msg_receipt_time as the only replication health signal — the WAL receiver sends periodic keepalives; a briefly stale timestamp during high-load periods is normal, but silence beyond wal_receiver_timeout indicates a broken connection. Also: Ignoring the received_lsn vs pg_last_wal_replay_lsn gap — received_lsn shows how much WAL has arrived; pg_last_wal_replay_lsn shows how much has been applied. A large gap means the replay process is falling behind. Also: Not checking pg_stat_wal_receiver when replication appears to lag per pg_stat_replication — the primary view shows sent LSN, but the standby may have received it without yet replaying; always cross-check both views. Also: Assuming status = 'streaming' means no lag — the standby can be streaming current WAL but still have a replay backlog; combine with pg_is_wal_replay_paused() and replay LSN comparisons.

✓ Instead: See description and related functions for the correct approach.

On a standby, `pg_stat_wal_receiver.status` should be 'streaming'. A 'catchup' status means the standby is replaying old WAL. If `last_msg_receipt_time` is stale, the WAL stream may be broken.

example

SELECT status, now() - last_msg_receipt_time AS silence_duration FROM pg_stat_wal_receiver;

→Long silence indicates a broken replication connection

pg_statio_user_indexes

PG 8.2+→ setof record

View showing per-index I/O statistics: index blocks read from disk vs hits from buffer cache.

DBAReliability

Signature

SELECT * FROM pg_statio_user_indexes

Examples

sql

SELECT relname, indexrelname, idx_blks_read, idx_blks_hit, round(100.0 * idx_blks_hit / nullif(idx_blks_hit + idx_blks_read, 0), 1) AS hit_pct FROM pg_statio_user_indexes ORDER BY idx_blks_read DESC LIMIT 10;

→Indexes with most disk reads — low cache hit candidates

sql

SELECT s.indexrelname, s.idx_scan, io.idx_blks_read, round(100.0 * io.idx_blks_hit / nullif(io.idx_blks_hit + io.idx_blks_read, 0), 1) AS hit_pct FROM pg_stat_user_indexes s JOIN pg_statio_user_indexes io USING (indexrelid) WHERE s.idx_scan > 100 ORDER BY io.idx_blks_read DESC;

→High-use indexes with poor cache hit — candidates for shared_buffers increase

sql

SELECT relname, indexrelname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, round(100.0 * idx_blks_hit / nullif(idx_blks_hit + idx_blks_read, 0), 1) AS hit_pct FROM pg_statio_user_indexes ORDER BY pg_relation_size(indexrelid) DESC LIMIT 10;

→Largest indexes with their cache efficiency

sql

SELECT count(*) AS cold_indexes FROM pg_statio_user_indexes WHERE idx_blks_hit = 0 AND idx_blks_read = 0;

→Indexes with zero I/O — likely never accessed since last stats reset

⚠Anti-Pattern— Evaluating index cache health using only hit ratio without checking scan count

Evaluating index cache health using only idx_blks_hit / (idx_blks_hit + idx_blks_read) without checking idx_scan from pg_stat_user_indexes — a rarely-used index will show 100% cache hit simply because it is almost never accessed; cross-reference scan counts. Also: Treating idx_blks_read = 0 as proof that an index fits in cache — if the index is never scanned, it generates zero reads regardless of size; confirm with pg_stat_user_indexes.idx_scan > 0. Also: Ignoring TOAST index I/O — pg_statio_user_indexes covers the main table indexes but TOAST table indexes appear under pg_statio_all_indexes; for tables with large text/JSON columns, check TOAST I/O separately. Also: Not pairing pg_statio_user_indexes with pg_statio_user_tables — an index may appear efficient while the heap it points to has a poor cache hit rate, making the overall access pattern still disk-bound. Also: Resetting pg_stat_reset() without documenting the reset time — both pg_statio views reset with pg_stat_reset(); data collected after a reset is incomparable to pre-reset data, and failure to track reset events makes trend analysis unreliable.

✓ Instead: See description and related functions for the correct approach.

Active indexes are kept warm in shared_buffers. An index with low `hit_pct` is either rarely used (verify with `pg_stat_user_indexes.idx_scan`) or too large to fit in cache — consider a partial index to reduce its size.

example

SELECT s.indexrelname, s.idx_scan, io.idx_blks_read, round(100.0 * io.idx_blks_hit / nullif(io.idx_blks_hit + io.idx_blks_read, 0), 1) AS hit_pct FROM pg_stat_user_indexes s JOIN pg_statio_user_indexes io USING (indexrelid) WHERE s.idx_scan > 100 ORDER BY io.idx_blks_read DESC;

→Used indexes with poor cache hit — likely too large

pg_statio_user_tables

PG 8.2+→ setof record

View showing I/O statistics per user table: heap blocks read from disk vs hits from cache, index blocks read vs hits.

DBAReliability

Signature

SELECT * FROM pg_statio_user_tables

Examples

sql

SELECT relname, heap_blks_hit, heap_blks_read, round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 1) AS heap_hit_pct FROM pg_statio_user_tables ORDER BY heap_blks_read DESC;

→Per-table cache hit ratio — identify hot tables reading from disk

sql

SELECT relname, pg_size_pretty(pg_relation_size(relid)) AS table_size, round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 1) AS hit_pct FROM pg_statio_user_tables ORDER BY heap_blks_read DESC LIMIT 10;

→Largest tables with their cache hit rates — low hit_pct on big tables is expensive

sql

SELECT relname, idx_blks_read, idx_blks_hit, round(100.0 * idx_blks_hit / nullif(idx_blks_hit + idx_blks_read, 0), 1) AS idx_hit_pct FROM pg_statio_user_tables ORDER BY idx_blks_read DESC LIMIT 10;

→Index cache hit rate per table — complements heap hit rate analysis

sql

SELECT relname, heap_blks_hit, heap_blks_read FROM pg_statio_user_tables WHERE heap_blks_read > heap_blks_hit ORDER BY heap_blks_read DESC;

→Tables where disk reads outnumber cache hits — primary shared_buffers pressure sources

⚠Anti-Pattern— Computing cache hit ratio without NULLIF to handle zero denominators

Computing cache hit ratio with heap_blks_hit / (heap_blks_hit + heap_blks_read) without handling zero denominators — tables with zero I/O return NULL; always wrap the denominator with NULLIF to avoid division-by-zero errors. Also: Looking only at heap_blks_read and ignoring idx_blks_read — a table can have a high heap cache hit rate while its indexes are causing significant disk reads; always inspect both columns together. Also: Drawing conclusions from a single snapshot — blks_read accumulates over the server lifetime; a table that was cold-started long ago may show high reads that are no longer occurring. Sample twice and compare the delta. Also: Not cross-referencing with pg_stat_user_tables.seq_scan — a table with many heap_blks_read alongside a high seq_scan count is doing repeated full scans; the fix is an index, not more cache. Also: Assuming low heap_blks_read means the table is small — the OS page cache and the kernel's own buffering can satisfy reads before PostgreSQL counts them; heap_blks_read reflects only hits and misses against PostgreSQL's shared_buffers.

✓ Instead: See description and related functions for the correct approach.

If one table has a consistently low heap cache hit rate, it may be evicting other tables from shared_buffers. Consider partitioning it, using tablespaces on faster storage, or increasing `shared_buffers`.

example

SELECT relname, pg_size_pretty(pg_relation_size(relid)) AS size, round(100.0 * heap_blks_hit / nullif(heap_blks_hit + heap_blks_read, 0), 1) AS hit_pct FROM pg_statio_user_tables ORDER BY heap_blks_read DESC LIMIT 10;

→Large tables with low cache hit rates

PostgreSQL Statistics Views & Monitoring Functions

Jump to function

What are PostgreSQL Statistics Views & Monitoring Functions?

pg_blocking_pids

Signature

Parameters

Examples

pg_locks

Signature

Examples

Common Gotchas

pg_relation_size / pg_total_relation_size

Signatures

Parameters

Examples

pg_stat_activity

Signature

Examples

Common Gotchas

pg_stat_bgwriter

Signature

Examples

pg_stat_database

Signature

Examples

Common Gotchas

pg_stat_io

Signature

Examples

Common Gotchas

pg_stat_progress_create_index

Signature

Examples

Common Gotchas

pg_stat_progress_vacuum

Signature

Examples

pg_stat_replication

Signature

Examples

pg_stat_statements

Signature

Examples

Common Gotchas

pg_stat_user_indexes

Signature

Examples

pg_stat_user_tables

Signature

Examples

Common Gotchas

pg_stat_wal

Signature

Examples

Common Gotchas

pg_stat_wal_receiver

Signature

Examples

pg_statio_user_indexes

Signature

Examples

pg_statio_user_tables

Signature

Examples

Related PostgreSQL Categories