PostgreSQL Aggregate Functions

Complete reference for PostgreSQL aggregate functions covering count, sum, average, statistical aggregates, ordered-set aggregates, and the FILTER clause for conditional aggregation. Every function includes syntax, GROUP BY examples, and anti-patterns. Updated for PostgreSQL 16.

22 functions

What are PostgreSQL Aggregate Functions?

PostgreSQL aggregate functions compute a single result from multiple input rows and are used with GROUP BY to summarise data. PostgreSQL also supports ordered-set aggregates (percentile_cont, mode), hypothetical-set aggregates (rank, dense_rank within a group), and the FILTER clause for conditional aggregation without CASE. Common aggregate functions include count(), sum(), avg(), max(), min(), string_agg(), and array_agg().

count

PG 8.0+bigint

Counts the number of input rows (with *) or the number of input rows where the expression is not NULL.

DeveloperData Eng

Signatures

count ( * ) → bigint
count ( expression ) → bigint

Parameters

ParameterTypeDescription
expressionanyValue to count; NULLs are excluded. Use * to count all rows.

Examples

sql
SELECT count(*) FROM orders;
1042
sql
SELECT count(email) FROM users;
Counts only rows where email is not NULL
sql
SELECT count(DISTINCT status) FROM orders;
Number of unique statuses
sql
SELECT department, count(*) AS headcount FROM employees GROUP BY department ORDER BY headcount DESC;
Row count per department
Anti-PatternConfusing count(*) with count(col) for NULL-counting

A common mistake is using `count(col)` expecting it to count all rows, then being surprised that NULL values are silently skipped. Equally, using `count(*)` to count non-null values in a specific column is incorrect.

✓ Instead: Use `count(*)` for total row count and `count(col)` explicitly when NULL-exclusion is desired. To count NULLs: `count(*) - count(col)`.

`count(*)` counts all rows including NULLs. `count(col)` skips NULLs. Use `count(*) - count(col)` to count NULLs in a column without a CASE expression.

example
SELECT count(*) AS total, count(email) AS with_email, count(*) - count(email) AS missing_email FROM users;
Total rows, non-null emails, and missing email count
See also:sumavgmaxmin

sum

PG 8.0+bigint for smallint/int, numeric for bigint, same type for float/numeric

Computes the sum of all non-NULL input values.

DeveloperData Eng

Signature

sum ( expression ) → same as input or numeric

Parameters

ParameterTypeDescription
expressionnumeric typeValue to sum; NULLs are ignored

Examples

sql
SELECT sum(amount) FROM payments;
94823.50

sum of zero rows is NULL, not 0

sql
SELECT sum(amount) FROM payments WHERE amount IS NULL;
NULL
sql
SELECT COALESCE(sum(amount), 0) FROM payments WHERE user_id = 999;
0
sql
SELECT product_id, sum(quantity) AS total_sold FROM order_items GROUP BY product_id ORDER BY total_sold DESC;
Total units sold per product
Anti-PatternAssuming sum() returns 0 for empty groups

`sum()` returns NULL — not 0 — when there are no rows or all values are NULL. This can propagate silently through calculations and produce unexpected NULL results downstream.

✓ Instead: Always wrap with `COALESCE(sum(col), 0)` when a zero value is semantically correct for empty groups.

`sum()` returns NULL when applied to zero rows (or all-NULL rows), not 0. Wrap with `COALESCE(sum(x), 0)` whenever a zero result is meaningful.

example
SELECT user_id, COALESCE(sum(points), 0) AS total_points FROM user_points GROUP BY user_id;
0 for users with no points rows

avg

PG 8.0+numeric for integer/numeric input; double precision for float

Computes the arithmetic mean (average) of all non-NULL input values.

DeveloperData EngDBA

Signature

avg ( expression ) → numeric or double precision

Parameters

ParameterTypeDescription
expressionnumeric or intervalValue to average; NULLs are ignored

Examples

sql
SELECT avg(score) FROM results;
73.4
sql
SELECT avg(score)::numeric(5,2) FROM results;
73.40
sql
SELECT avg(response_time) FROM requests WHERE endpoint = '/api/v1/search';
Average response time
sql
SELECT department, avg(salary)::numeric(10,2) AS avg_salary FROM employees GROUP BY department;
Average salary per department rounded to 2 decimal places
Anti-PatternAveraging integer columns and losing decimal precision

When `avg()` is applied to integer columns, PostgreSQL returns a `numeric` type, but many developers cast back to integer with `::int`, silently truncating fractional values (e.g., an average of 3.7 becomes 3).

✓ Instead: Cast to `numeric(10,2)` or `float8` rather than `int` to preserve meaningful decimal precision: `avg(score)::numeric(10,2)`.

Instead of multiple subqueries, use `FILTER (WHERE ...)` to compute multiple averages in a single pass: `avg(score) FILTER (WHERE grade = 'A')` alongside `avg(score) FILTER (WHERE grade = 'B')`.

example
SELECT avg(score) FILTER (WHERE subject = 'math') AS math_avg, avg(score) FILTER (WHERE subject = 'science') AS sci_avg FROM grades;
Two averages in one query pass

max

PG 8.0+same as input

Returns the maximum value of all non-NULL input values.

DeveloperData Eng

Signature

max ( expression ) → same as input

Parameters

ParameterTypeDescription
expressionany sortable typeValue to find the maximum of

Examples

sql
SELECT max(amount) FROM orders;
9999.99
sql
SELECT max(created_at) FROM events;
Most recent timestamp
sql
SELECT max(name) FROM products;
Lexicographically last name
sql
SELECT user_id, max(score) AS personal_best FROM game_scores GROUP BY user_id;
Each user's highest score
Anti-PatternUsing ORDER BY + LIMIT 1 inside a subquery instead of max()

A frequent anti-pattern is `SELECT * FROM t WHERE val = (SELECT val FROM t ORDER BY val DESC LIMIT 1)` for each group, which is verbose and may not use indexes as effectively as a simple aggregate.

✓ Instead: Use `max()` with `GROUP BY` for per-group maximums. For fetching the full row of the max, use a window function: `DISTINCT ON` or `RANK() OVER (...)`.

`max` works on any sortable type: dates, timestamps, text, UUIDs, enums. Use `max(created_at)` to find the most recent event, or `max(name)` for the last alphabetically.

example
SELECT category, max(price) AS highest_price FROM products GROUP BY category;
Highest price per category

min

PG 8.0+same as input

Returns the minimum value of all non-NULL input values.

DeveloperData Eng

Signature

min ( expression ) → same as input

Parameters

ParameterTypeDescription
expressionany sortable typeValue to find the minimum of

Examples

sql
SELECT min(price) FROM products;
0.99
sql
SELECT min(created_at) FROM orders;
Oldest order timestamp
sql
SELECT category, min(price) AS lowest_price FROM products GROUP BY category;
Lowest price per category
sql
SELECT min(response_ms) AS fastest_response FROM api_logs WHERE endpoint = '/health';
Fastest recorded health-check response
Anti-PatternFetching the full min-value row with a redundant subquery

Using `WHERE price = (SELECT min(price) FROM products)` in a subquery is correct but becomes a repeated scan when done per group. It also returns multiple rows on ties without a tiebreaker.

✓ Instead: For per-group minimum rows, prefer `DISTINCT ON (group_col) ORDER BY group_col, price ASC` or a window function with `RANK()` to handle ties explicitly.

When you need the min or max value for each group, `GROUP BY` with `min()`/`max()` is always faster than a self-join or subquery with `ORDER BY ... LIMIT 1`.

example
SELECT user_id, min(created_at) AS first_order, max(created_at) AS last_order FROM orders GROUP BY user_id;
First and last order dates per user

array_agg

PG 8.4+anyarray

Collects all non-NULL input values into an array. Supports ORDER BY to control element order.

DeveloperData Eng

Signature

array_agg ( expression [ORDER BY ...] ) → array

Parameters

ParameterTypeDescription
expressionanyValue to collect; NULLs are excluded

Examples

sql
SELECT array_agg(name ORDER BY name) FROM tags;
{css,html,javascript}
sql
SELECT user_id, array_agg(tag) FROM user_tags GROUP BY user_id;
Array of tags per user
sql
SELECT array_agg(DISTINCT status) FROM orders;
{cancelled,completed,pending}
sql
SELECT author_id, array_agg(title ORDER BY published_at DESC) AS recent_books FROM books GROUP BY author_id;
Books per author sorted newest-first
Anti-PatternUsing array_agg without ORDER BY and relying on element order

Without an explicit `ORDER BY`, `array_agg` produces elements in an undefined, non-deterministic order that can change between query executions, PostgreSQL versions, or after autovacuum.

✓ Instead: Always specify `array_agg(col ORDER BY col)` or an appropriate sort key when the order of array elements matters downstream.

Use `array_agg` to fetch parent + children in one query: `SELECT u.id, array_agg(t.name) AS tags FROM users u JOIN user_tags t ON t.user_id = u.id GROUP BY u.id`. This avoids N+1 queries.

example
SELECT author_id, array_agg(title ORDER BY published_at) AS books FROM books GROUP BY author_id;
All book titles per author in chronological order

string_agg

PG 9.0+text

Concatenates non-NULL string values with a delimiter. Supports ORDER BY to control output order.

DeveloperData Eng

Signature

string_agg ( value text, delimiter text [ORDER BY ...] ) → text

Parameters

ParameterTypeDescription
valuetext or byteaString to concatenate; NULLs are skipped
delimitertext or byteaSeparator between values

Examples

sql
SELECT string_agg(name, ', ' ORDER BY name) FROM tags;
css, html, javascript
sql
SELECT user_id, string_agg(role, ' | ') FROM user_roles GROUP BY user_id;
admin | editor | viewer
sql
SELECT string_agg(email, E'\n') FROM newsletter_subscribers;
Newline-separated email list
sql
SELECT category, string_agg(DISTINCT tag, ', ' ORDER BY tag) FROM items GROUP BY category;
Deduplicated, sorted tag list per category
Anti-PatternOmitting the delimiter argument or using an empty string unintentionally

`string_agg` requires exactly two arguments. Passing an empty string `''` as the delimiter will concatenate values without any separator, which is often a mistake when a comma or space was intended.

✓ Instead: Always explicitly pass the desired delimiter. If you truly want no separator, document it clearly. Use `array_agg` + `array_to_string` if you need to switch delimiters later.

`string_agg` is the cleanest way to produce comma-separated value strings in SQL. Use it in reporting queries where the application expects a single string field with multiple values.

example
SELECT category, string_agg(product_name, ', ' ORDER BY product_name) AS products FROM items GROUP BY category;
"Electronics: laptop, phone, tablet"

json_agg

PG 9.3+json

Aggregates values into a JSON array, including NULLs (as JSON null).

DeveloperData Eng

Signature

json_agg ( expression [ORDER BY ...] ) → json

Parameters

ParameterTypeDescription
expressionanyValue to include in the JSON array; NULLs become JSON null

Examples

sql
SELECT json_agg(name ORDER BY name) FROM tags;
["css", "html", "javascript"]
sql
SELECT user_id, json_agg(row_to_json(t)) FROM orders t GROUP BY user_id;
JSON array of order objects per user
sql
SELECT json_agg(json_build_object('id', id, 'name', name) ORDER BY name) FROM products WHERE active;
[{"id":1,"name":"Alpha"}, ...]
sql
SELECT department, json_agg(employee_name) FILTER (WHERE active = true) AS active_staff FROM employees GROUP BY department;
JSON array of active employees per department
Anti-PatternUsing json_agg when JSONB operators will be needed on the result

`json_agg` returns the `json` type, which cannot be directly queried with operators like `@>`, `?`, or `->>` in a performant way as a stored value.

✓ Instead: Use `jsonb_agg` when the aggregated array will be stored in a JSONB column or queried with JSONB containment and key-existence operators.

Combine `json_agg(row_to_json(t))` to build nested JSON directly: `SELECT u.id, json_agg(row_to_json(o)) AS orders FROM users u JOIN orders o ON o.user_id = u.id GROUP BY u.id`. No application-side assembly needed.

example
SELECT author, json_agg(json_build_object('title', title, 'year', year) ORDER BY year) AS books FROM books GROUP BY author;
Nested JSON with author and their books

jsonb_agg

PG 9.4+jsonb

Aggregates values into a JSONB array. Equivalent to json_agg but returns the binary JSONB type.

DeveloperData Eng

Signature

jsonb_agg ( expression [ORDER BY ...] ) → jsonb

Parameters

ParameterTypeDescription
expressionanyValue to include; NULLs become JSON null

Examples

sql
SELECT jsonb_agg(name ORDER BY name) FROM tags;
["css", "html", "javascript"]
sql
SELECT id, jsonb_agg(detail) FILTER (WHERE detail IS NOT NULL) FROM items GROUP BY id;
JSONB arrays without nulls
sql
SELECT user_id, jsonb_agg(jsonb_build_object('role', role, 'granted_at', granted_at) ORDER BY granted_at) FROM user_roles GROUP BY user_id;
Ordered JSONB array of role objects per user
sql
SELECT project_id, jsonb_agg(DISTINCT tag) AS tags FROM project_tags GROUP BY project_id;
Deduplicated JSONB tag arrays per project
Anti-PatternStoring jsonb_agg results without considering deduplication

Unlike sets, `jsonb_agg` preserves duplicates. If you aggregate from a JOIN that produces duplicate rows (e.g., without DISTINCT), the resulting JSONB array will contain repeated entries silently.

✓ Instead: Use `jsonb_agg(DISTINCT col)` or ensure the source rows are deduplicated before aggregation.

Prefer `jsonb_agg` over `json_agg` when the result will be queried with JSONB operators (`->`, `@>`, `?`). JSONB is stored in a parsed binary format, making containment checks much faster.

example
SELECT user_id, jsonb_agg(tag) AS tags FROM user_tags GROUP BY user_id;
JSONB arrays that support @> and ? operators

json_object_agg

PG 9.3+json

Aggregates name/value pairs into a JSON object. Duplicate keys produce duplicate keys in output (last-writer-wins behavior varies by use).

DeveloperData Eng

Signature

json_object_agg ( key "any", value "any" [ORDER BY ...] ) → json

Parameters

ParameterTypeDescription
keyanyObject key (converted to text)
valueanyObject value

Examples

sql
SELECT json_object_agg(key, value) FROM config;
{"host": "localhost", "port": "5432"}
sql
SELECT json_object_agg(month, revenue) FROM monthly_revenue ORDER BY month;
{"1": 5000, "2": 6200, ...}
sql
SELECT json_object_agg(attribute, value) AS metadata FROM product_attrs WHERE product_id = 42;
{"color": "red", "size": "L", "weight": "0.5kg"}
sql
SELECT entity_id, json_object_agg(attr_name, attr_value ORDER BY attr_name) AS props FROM eav_table GROUP BY entity_id;
Alphabetically-keyed JSON property bag per entity
Anti-PatternAssuming duplicate keys are deduplicated in the output object

`json_object_agg` does not enforce unique keys. When source rows contain duplicate key values, the resulting JSON object will have repeated keys, which is technically valid JSON but breaks most parsers and libraries.

✓ Instead: Ensure keys are unique before aggregating, or use `jsonb_object_agg` which overwrites duplicate keys with the last value (deterministic with `ORDER BY`).

Entity-Attribute-Value tables can be pivoted into JSON objects with `json_object_agg(attribute, value)`. This is far more flexible than CASE-based pivoting for dynamic attribute sets.

example
SELECT entity_id, json_object_agg(attribute, value) AS props FROM eav_table GROUP BY entity_id;
Each entity's attributes as a JSON object

bool_and

PG 8.0+boolean

Returns true if all non-NULL input values are true, otherwise false.

DeveloperData Eng

Signature

bool_and ( expression boolean ) → boolean

Parameters

ParameterTypeDescription
expressionbooleanBoolean expression to evaluate across rows

Examples

sql
SELECT bool_and(is_verified) FROM users;
false (if any user is unverified)
sql
SELECT order_id, bool_and(shipped) AS all_shipped FROM order_items GROUP BY order_id;
true only if every item in the order has shipped
sql
SELECT bool_and(constraint_check) FROM (SELECT col > 0 AS constraint_check FROM measurements) t;
true if every measurement is positive
sql
SELECT project_id, bool_and(passed) AS all_tests_passed FROM test_runs WHERE suite = 'regression' GROUP BY project_id;
true only when every regression test passed
Anti-PatternUsing COUNT + GROUP BY to check universal conditions instead of bool_and

A common workaround is `HAVING count(*) = count(*) FILTER (WHERE cond)` to verify all rows match a condition. This is hard to read and easy to get wrong.

✓ Instead: Use `bool_and(condition)` for a clear, intent-revealing check that every row satisfies the condition.

`bool_and` is the aggregate equivalent of ALL. Use it to check if every item in a group passes a condition — e.g., all order items shipped, all tests passed.

example
SELECT project_id, bool_and(passed) AS all_tests_passed FROM test_runs GROUP BY project_id;
true only if every test in the project passed

bool_or

PG 8.0+boolean

Returns true if at least one non-NULL input value is true.

DeveloperData Eng

Signature

bool_or ( expression boolean ) → boolean

Parameters

ParameterTypeDescription
expressionbooleanBoolean expression to evaluate across rows

Examples

sql
SELECT bool_or(is_admin) FROM group_members WHERE group_id = 5;
true if any member is an admin
sql
SELECT user_id, bool_or(failed) AS had_failure FROM login_attempts GROUP BY user_id;
true if the user had any failed login
sql
SELECT bool_or(price < 0) AS has_negative_price FROM products;
Data quality check: true if any product has a negative price
sql
SELECT customer_id, bool_or(amount > 1000) AS has_large_order FROM orders GROUP BY customer_id;
true if customer ever placed an order over $1000
Anti-PatternUsing a correlated EXISTS subquery per group instead of bool_or

Writing a correlated `EXISTS (SELECT 1 FROM t WHERE t.group_id = outer.id AND condition)` for every group in a larger query causes repeated subplan executions.

✓ Instead: Use `bool_or(condition)` inside a `GROUP BY` query for a single-pass, cleaner equivalent.

When computing per-group boolean checks in a GROUP BY query, `bool_or(condition)` is cleaner and often faster than a correlated EXISTS subquery for each group.

example
SELECT customer_id, bool_or(amount > 1000) AS has_large_order FROM orders GROUP BY customer_id;
true if customer ever placed a large order

bit_and

PG 8.0+same as input

Computes the bitwise AND of all non-NULL input values.

DeveloperDBASecurity

Signature

bit_and ( expression ) → integer or bigint or bit

Parameters

ParameterTypeDescription
expressioninteger, bigint, or bitValue to bitwise-AND

Examples

sql
SELECT bit_and(permissions) FROM user_roles WHERE user_id = 1;
Intersection of all permission bitmasks
sql
SELECT bit_and(flags) FROM feature_flags;
Flags set in ALL rows
sql
SELECT role_group, bit_and(permission_mask) AS shared_perms FROM role_assignments GROUP BY role_group;
Permissions shared by every role in each group
sql
SELECT bit_and(status_flags) & 4 AS all_have_flag_3 FROM sessions WHERE started_at > now() - interval '1 hour';
Non-zero if every recent session has flag bit 2 set
Anti-PatternChecking individual bits with equality instead of bitmask operations

Testing `WHERE permissions = 7` to check if a user has all three permission bits set conflates exact equality with bit intersection, breaking when other bits are also set.

✓ Instead: Use `bit_and(permissions) & required_mask = required_mask` to verify that specific bits are set across all rows.

`bit_and` gives the intersection of permission bitmasks — bits that are set in every row. Use it to find the minimum common permissions for a user with multiple roles.

example
SELECT bit_and(permission_mask) AS common_perms FROM user_roles WHERE user_id = $1;
Bitmask of permissions common to all roles
See also:bit_orbit_xor

bit_or

PG 8.0+same as input

Computes the bitwise OR of all non-NULL input values.

DeveloperDBASecurity

Signature

bit_or ( expression ) → integer or bigint or bit

Parameters

ParameterTypeDescription
expressioninteger, bigint, or bitValue to bitwise-OR

Examples

sql
SELECT bit_or(permissions) FROM user_roles WHERE user_id = 1;
Union of all permission bitmasks
sql
SELECT bit_or(flags) FROM feature_flags;
All flags set in ANY row
sql
SELECT team_id, bit_or(capability_flags) AS team_capabilities FROM team_members GROUP BY team_id;
Union of all capability flags across each team
sql
SELECT bit_or(event_type_mask) AS seen_event_types FROM audit_log WHERE session_id = $1;
Bitmask of all distinct event types seen in the session
Anti-PatternUsing multiple OR-joined equality checks instead of bitmask union

Writing `WHERE perms = 1 OR perms = 2 OR perms = 4` instead of leveraging bitmask aggregation becomes unmaintainable as the flag set grows and cannot summarize across multiple roles.

✓ Instead: Store capabilities as bitmasks and use `bit_or(permission_mask)` to compute the effective union in a single aggregate scan.

`bit_or` gives the union of permission bitmasks — every bit set in any row. Perfect for computing effective user permissions across multiple roles.

example
SELECT bit_or(permission_mask) AS effective_perms FROM user_roles WHERE user_id = $1;
All permissions the user has via any role

stddev

PG 8.0+double precision or numeric

Returns the sample standard deviation of all non-NULL input values (alias for stddev_samp).

DeveloperData EngDBA

Signature

stddev ( expression ) → double precision or numeric

Parameters

ParameterTypeDescription
expressionnumeric or floatValues to compute standard deviation over

Examples

sql
SELECT stddev(score) FROM test_results;
12.34 (varies)
sql
SELECT avg(score) AS mean, stddev(score) AS sd FROM scores;
Mean and standard deviation
sql
SELECT product_id, avg(price) AS avg_price, stddev(price) AS price_spread FROM price_history GROUP BY product_id;
Price volatility per product
sql
SELECT stddev(latency_ms) AS jitter FROM network_samples WHERE host = 'db-primary' AND sampled_at > now() - interval '1 hour';
Network jitter (latency variability) over the last hour
Anti-PatternUsing stddev on a very small sample and treating it as population stddev

`stddev` (alias for `stddev_samp`) applies Bessel's correction (divides by N-1), which is appropriate for samples but inflates the value for tiny groups. On a group of 2 rows, the result can be misleading.

✓ Instead: Use `stddev_pop` when working with complete populations (all data, not a sample). Always check `count(*) > 30` before drawing statistical conclusions from `stddev`.

Use `avg(x) ± 2 * stddev(x)` to define a normal range. Values outside this range are statistical outliers. Combine with a CTE or subquery to label outliers inline.

example
SELECT id, value, CASE WHEN abs(value - avg_val) > 2 * sd THEN 'outlier' ELSE 'normal' END FROM (SELECT *, avg(value) OVER () AS avg_val, stddev(value) OVER () AS sd FROM measurements) t;
Each row labelled as outlier or normal

variance

PG 8.0+double precision or numeric

Returns the sample variance of all non-NULL input values (alias for var_samp).

DeveloperData EngDBA

Signature

variance ( expression ) → double precision or numeric

Parameters

ParameterTypeDescription
expressionnumeric or floatValues to compute variance over

Examples

sql
SELECT variance(score) FROM test_results;
152.3 (varies)
sql
SELECT variance(response_time) AS variability FROM requests;
Variance of response times
sql
SELECT sensor_id, avg(reading) AS mean_val, variance(reading) AS instability FROM sensor_data GROUP BY sensor_id;
Sensor stability report
sql
SELECT sqrt(variance(x) + variance(y)) AS combined_stddev FROM measurements;
Combined standard deviation of two independent variables
Anti-PatternComparing variance values across different units or scales

Variance is scale-dependent: doubling measurement units quadruples the variance. Comparing `variance(price_usd)` to `variance(quantity)` is meaningless because the units differ.

✓ Instead: Standardize values (z-score normalize) before comparing spread across different columns, or compare coefficients of variation (`stddev / avg`) instead.

`variance(x)` = `stddev(x)^2`. Use variance when you need to add contributions from multiple independent sources (variances add, standard deviations don't). For human-readable spread, use stddev.

example
SELECT sqrt(variance(x) + variance(y)) AS combined_stddev FROM measurements;
Combined standard deviation from two independent variables

percentile_cont

PG 9.4+double precision or interval

Computes a continuous percentile value that interpolates between adjacent values if needed.

DeveloperData EngDBA

Signature

percentile_cont ( fraction ) WITHIN GROUP ( ORDER BY sort_expression ) → double precision or interval

Parameters

ParameterTypeDescription
fractiondouble precisionPercentile as a fraction between 0 and 1 (e.g., 0.5 = median)
sort_expressionsortableValues to compute the percentile over

Examples

sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median FROM employees;
62000 (varies)
sql
SELECT percentile_cont(0.95) WITHIN GROUP (ORDER BY response_ms) AS p95 FROM requests;
p95 response time
sql
SELECT percentile_cont(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY value) AS quartiles FROM data;
{Q1, median, Q3}
sql
SELECT department, percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary FROM employees GROUP BY department;
Median salary per department
Anti-PatternUsing avg() as a proxy for median on skewed distributions

Averages are heavily influenced by outliers. In skewed distributions (e.g., income, response times), the average can be much higher than what a typical value looks like, misleading stakeholders.

✓ Instead: Use `percentile_cont(0.5) WITHIN GROUP (ORDER BY col)` for the true median, and report p95/p99 for tail latency rather than average.

Pass an array to `percentile_cont` to compute multiple percentiles in a single scan: `percentile_cont(ARRAY[0.5, 0.95, 0.99]) WITHIN GROUP (ORDER BY x)` returns an array with p50, p95, p99.

example
SELECT percentile_cont(ARRAY[0.5, 0.95, 0.99]) WITHIN GROUP (ORDER BY duration_ms) AS percentiles FROM api_calls;
[median, p95, p99] in one pass

percentile_disc

PG 9.4+same as sort expression

Computes a discrete percentile — returns the first value whose cumulative distribution is >= fraction, always an actual input value.

DeveloperData EngDBA

Signature

percentile_disc ( fraction ) WITHIN GROUP ( ORDER BY sort_expression ) → same type as sort expression

Parameters

ParameterTypeDescription
fractiondouble precisionPercentile as a fraction 0–1
sort_expressionsortableValues to rank

Examples

sql
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY salary) AS median FROM employees;
60000 (actual value in dataset)
sql
SELECT percentile_disc(1.0) WITHIN GROUP (ORDER BY score) AS max_score FROM results;
Maximum score
sql
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY status_code) AS median_status FROM requests;
Actual status code at the 50th percentile
sql
SELECT region, percentile_disc(0.75) WITHIN GROUP (ORDER BY order_amount) AS p75_order FROM orders GROUP BY region;
75th percentile order value per region, always an actual order amount
Anti-PatternUsing percentile_disc expecting an interpolated result for continuous data

`percentile_disc` always returns an actual row value and never interpolates. For continuous numeric data (e.g., response times), this can give a noticeably different result from the true mathematical percentile.

✓ Instead: Use `percentile_cont` for continuous numeric or interval data where interpolation between values is statistically correct.

Use `percentile_disc` when the result must be an actual data value — e.g., finding the median product category rank or the typical text response. `percentile_cont` may interpolate between values, `percentile_disc` always picks a real row.

example
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY status_code) AS median_status FROM requests;
Actual status code at the median

mode

PG 9.4+same as sort expression

Returns the most frequent value (mode) in the input. Ties are broken by returning the smallest.

DeveloperData Eng

Signature

mode ( ) WITHIN GROUP ( ORDER BY sort_expression ) → same type as sort expression

Parameters

ParameterTypeDescription
sort_expressionsortableValues to find the mode of

Examples

sql
SELECT mode() WITHIN GROUP (ORDER BY status) FROM orders;
completed (most common status)
sql
SELECT mode() WITHIN GROUP (ORDER BY country_code) FROM customers;
US (most common country)
sql
SELECT customer_id, mode() WITHIN GROUP (ORDER BY product_id) AS fav_product FROM orders GROUP BY customer_id;
Most frequently ordered product per customer
sql
SELECT mode() WITHIN GROUP (ORDER BY error_code) AS most_common_error FROM application_logs WHERE level = 'ERROR' AND logged_at > now() - interval '24 hours';
Most frequent error code in the last 24 hours
Anti-PatternAssuming mode() handles ties by frequency rank

When two or more values share the highest frequency, `mode()` breaks ties by returning the smallest value (per the ORDER BY direction), not by any secondary frequency criterion. This can be surprising.

✓ Instead: If tie-breaking matters for your use case, use `GROUP BY value ORDER BY count(*) DESC, value ASC LIMIT 1` in a subquery, which makes the tie-breaking rule explicit.

`mode()` returns the most frequent value in the group directly, without needing a nested subquery or `GROUP BY value ORDER BY count(*) DESC LIMIT 1`.

example
SELECT customer_id, mode() WITHIN GROUP (ORDER BY product_id) AS fav_product FROM orders GROUP BY customer_id;
Most purchased product per customer

corr

PG 8.0+double precision

Returns the correlation coefficient between two sets of numbers (Pearson's r).

DeveloperData EngDBA

Signature

corr ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable values
Xdouble precisionIndependent variable values

Examples

sql
SELECT corr(revenue, ad_spend) FROM campaigns;
0.87 (strong positive correlation)
sql
SELECT corr(score, study_hours) FROM students;
Correlation between study time and scores
sql
SELECT corr(churn_rate, avg_response_time) FROM monthly_metrics;
How strongly latency correlates with churn
sql
SELECT product_id, corr(daily_sales, avg_temp) AS sales_weather_corr FROM daily_product_weather GROUP BY product_id HAVING count(*) > 30;
Correlation between weather and sales per product (minimum sample filter)
Anti-PatternInterpreting correlation as causation without sufficient sample size

`corr()` returns a value between -1 and 1 but gives no indication of whether the sample is large enough for the correlation to be statistically significant. A corr of 0.95 from 3 data points is essentially noise.

✓ Instead: Always pair `corr()` with a `HAVING count(*) > N` guard (typically N ≥ 30) and consider computing a p-value or confidence interval via external tooling for critical decisions.

Use `corr(y, x)` to measure linear correlation (-1 to 1) without leaving SQL. A value near 1 means strong positive correlation, near -1 strong negative, near 0 no linear relationship.

example
SELECT corr(churn_rate, avg_response_time) AS latency_churn_correlation FROM monthly_metrics;
How strongly latency correlates with churn

regr_slope

PG 8.0+double precision

Returns the slope of the least-squares fit linear equation for the data points (Y, X).

DeveloperData EngDBA

Signature

regr_slope ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_slope(revenue, ad_spend) FROM campaigns;
3.24 (revenue per dollar of ad spend)
sql
SELECT regr_slope(y, x), regr_intercept(y, x) FROM data_points;
Slope and intercept of best-fit line
sql
SELECT regr_slope(revenue, month_num) AS monthly_growth, regr_intercept(revenue, month_num) AS base FROM monthly_revenue;
Linear growth rate and base value
sql
SELECT service, regr_slope(error_count, request_count) AS error_rate_trend FROM daily_service_metrics GROUP BY service HAVING count(*) >= 14;
Error rate trend slope per service over observed days
Anti-PatternApplying linear regression to non-linear relationships

`regr_slope` fits a straight line. If the relationship between Y and X is exponential, logarithmic, or cyclical, the slope will be misleading and forecasts will be inaccurate.

✓ Instead: Always plot or inspect the data first. For exponential growth, apply `ln()` to the Y values before regression. Use `regr_r2` to validate how well the linear model actually fits.

Compute trend line coefficients directly in SQL: `regr_slope(y, x) * new_x + regr_intercept(y, x)` gives a forecasted value. No need for Python/R for simple linear projections.

example
SELECT regr_slope(revenue, month_num) AS monthly_growth, regr_intercept(revenue, month_num) AS base FROM monthly_revenue;
Linear growth rate and base value

range_agg

PG 14+anymultirange

Computes the union of all non-NULL range values, returning a multirange.

DeveloperData Eng

Signature

range_agg ( expression anyrange ) → anymultirange

Parameters

ParameterTypeDescription
expressionanyrangeRange values to union together

Examples

sql
SELECT range_agg(during) FROM reservations WHERE room_id = 1;
{[2025-01-01,2025-01-05),[2025-01-10,2025-01-15)}
sql
SELECT room_id, range_agg(during) AS occupied FROM reservations GROUP BY room_id;
Occupied periods per room
sql
SELECT range_agg(valid_during) AS total_coverage FROM subscription_periods WHERE user_id = $1;
All subscription periods merged into a multirange
sql
SELECT tsrange('2025-01-01', '2025-02-01') - range_agg(booked_at) AS available_slots FROM bookings WHERE resource_id = 42;
Free time slots in January as a multirange
Anti-PatternManually merging overlapping ranges in application code instead of using range_agg

A common pattern before PG 14 was to fetch all ranges into the application layer and merge overlaps using custom loop logic. This is slow for large datasets and moves work out of the database.

✓ Instead: Use `range_agg` (PG 14+) to compute the union of all ranges server-side in a single aggregate pass, then subtract from the full interval to find gaps.

Use `range_agg` to merge overlapping bookings/reservations into a single multirange, then subtract from the full range to find available slots — all in SQL.

example
SELECT int8range(0, 86400) - range_agg(during) AS free_slots FROM bookings WHERE day = today;
Free time slots as a multirange