📊

PostgreSQL Statistical Aggregate Functions

Complete reference for PostgreSQL statistical aggregate functions covering correlation, linear regression, variance, standard deviation, percentile, and mode. Every function includes syntax, use cases for data analysis, and examples using real datasets. Updated for PostgreSQL 16.

19 functions

What are PostgreSQL Statistical Aggregate Functions?

PostgreSQL statistical functions are aggregate functions for performing statistical analysis directly in SQL without pulling data into a separate tool. They include var_pop() and var_samp() for variance, stddev() for standard deviation, corr() for Pearson correlation, regr_slope() and regr_intercept() for linear regression, percentile_cont() for interpolated percentiles, and mode() for the most frequent value.

stddev_pop

PG 8.0+double precision

Returns the population standard deviation of all non-null input values.

Data EngDeveloperDBA

Signature

stddev_pop ( numeric ) → double precision

Parameters

ParameterTypeDescription
expressionnumericNumeric values to compute population standard deviation over

Examples

sql
SELECT stddev_pop(score) FROM test_results;
Population std dev of all scores
sql
SELECT department, stddev_pop(salary) FROM employees GROUP BY department;
Salary spread per department
Anti-Pattern

Use `stddev_pop` when your data IS the entire population. Use `stddev_samp` (or `stddev`) when your data is a sample. For small samples, the difference is significant.

example
SELECT stddev_pop(score) AS pop_sd, stddev_samp(score) AS sample_sd FROM scores;
Compare population vs sample standard deviation

stddev / stddev_samp

PG 8.0+double precision

Returns the sample standard deviation of all non-null input values. `stddev` is an alias for `stddev_samp`.

Data EngDeveloperDBA

Signatures

stddev_samp ( numeric ) → double precision
stddev ( numeric ) → double precision

Parameters

ParameterTypeDescription
expressionnumericNumeric values to compute sample standard deviation over

Examples

sql
SELECT stddev(price) FROM products;
Sample standard deviation of product prices
sql
SELECT category, round(stddev(price)::numeric, 2) FROM products GROUP BY category;
Price spread per category
Anti-Pattern

Sample standard deviation requires at least 2 values. With only 1 row in a group, `stddev` returns NULL. Use `COALESCE(stddev(x), 0)` if you need 0 instead of NULL for single-row groups.

example
SELECT group_id, COALESCE(stddev(value), 0) AS spread FROM measurements GROUP BY group_id;
0 for single-measurement groups, actual spread otherwise

var_pop

PG 8.0+double precision

Returns the population variance of all non-null input values (square of population standard deviation).

Data EngDeveloperDBA

Signature

var_pop ( numeric ) → double precision

Parameters

ParameterTypeDescription
expressionnumericNumeric values to compute population variance over

Examples

sql
SELECT var_pop(response_time_ms) FROM api_logs;
Variance of response times across all requests
Anti-Pattern

`var_pop(x) = stddev_pop(x)^2`. Use variance when composing statistics (variances add under independence) and convert to std dev for human-readable output.

example
SELECT sqrt(var_pop(latency)) AS stddev_check, stddev_pop(latency) AS direct FROM perf_logs;
Both return the same value

var_samp / variance

PG 8.0+double precision

Returns the sample variance of all non-null input values. `variance` is an alias for `var_samp`.

Data EngDeveloperDBA

Signatures

var_samp ( numeric ) → double precision
variance ( numeric ) → double precision

Parameters

ParameterTypeDescription
expressionnumericNumeric values to compute sample variance over

Examples

sql
SELECT variance(amount) FROM transactions WHERE month = '2025-01';
Sample variance of transaction amounts in January
Anti-Pattern

The coefficient of variation (CV = stddev/mean) tells you relative variability. Compute it with `stddev(x) / nullif(avg(x), 0)` — useful for comparing variability across different scales.

example
SELECT stddev(revenue) / nullif(avg(revenue), 0) AS cv FROM monthly_sales;
Relative variability (lower = more consistent)

corr

PG 8.0+double precision

Returns the Pearson correlation coefficient between two sets of values. Returns a value from -1 (perfect negative correlation) to 1 (perfect positive correlation).

Data EngDeveloperDBA

Signature

corr ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable values
Xdouble precisionIndependent variable values

Examples

sql
SELECT corr(revenue, ad_spend) FROM campaigns;
Correlation between ad spend and revenue
sql
SELECT corr(price, quantity_sold) FROM sales_data;
Price-demand correlation (expected negative)
Anti-Pattern

Use `corr()` to quickly scan for relationships between columns. A correlation near 0 means no linear relationship. Values above 0.7 or below -0.7 usually warrant further investigation.

example
SELECT corr(load_avg, response_time) AS load_vs_latency, corr(connection_count, response_time) AS conn_vs_latency FROM server_metrics;
Scan multiple potential causes of latency in one query

regr_slope

PG 8.0+double precision

Returns the slope of the least-squares fit linear equation through the (X, Y) data points.

Data EngDeveloperDBA

Signature

regr_slope ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_slope(revenue, months_active) FROM customers;
Average revenue increase per additional month of activity
sql
SELECT regr_slope(cpu_pct, connection_count) FROM server_stats;
CPU increase per additional connection
Anti-Pattern

Use `regr_slope(y, x) * future_x + regr_intercept(y, x)` to forecast values in-database. This is a lightweight alternative to exporting data to a separate analytics tool.

example
SELECT regr_slope(revenue, week_num) * 52 + regr_intercept(revenue, week_num) AS projected_annual FROM weekly_sales;
Full-year revenue projection from trend line

regr_intercept

PG 8.0+double precision

Returns the y-intercept of the least-squares fit linear equation through the (X, Y) data points.

Data EngDeveloperDBA

Signature

regr_intercept ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_intercept(sale_price, sq_footage) FROM properties;
Base price when square footage is 0 (theoretical minimum)
Anti-Pattern

Retrieve slope, intercept, and R² together: `SELECT regr_slope(y,x), regr_intercept(y,x), regr_r2(y,x)`. The R² value tells you how well the line fits (0 = no fit, 1 = perfect fit).

example
SELECT regr_slope(price, sqft) AS slope, regr_intercept(price, sqft) AS base, regr_r2(price, sqft) AS r_squared FROM homes;
Complete linear model parameters in one pass

regr_r2

PG 8.0+double precision

Returns the square of the Pearson correlation coefficient (coefficient of determination). Indicates how well a linear model fits the data.

Data EngDeveloperDBA

Signature

regr_r2 ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_r2(revenue, ad_spend) FROM campaigns;
Fraction of revenue variance explained by ad spend
Anti-Pattern

An R² of 0.8 means 80% of the variance in Y is explained by X. Note: a high R² does not guarantee the relationship is causal or that the linear model is appropriate.

example
SELECT feature, regr_r2(churn::int, feature_value) AS predictive_power FROM feature_correlations ORDER BY predictive_power DESC;
Rank features by how well they linearly predict churn

regr_count

PG 8.0+bigint

Returns the number of rows in which both Y and X are not null. Used as the N for other regression statistics.

Data EngDeveloperDBA

Signature

regr_count ( Y double precision, X double precision ) → bigint

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_count(revenue, ad_spend) FROM campaigns;
Number of rows with both values non-null
Anti-Pattern

Always check `regr_count` alongside regression results. A slope computed from 3 data points is unreliable. Filter groups with too few data points using HAVING.

example
SELECT segment, regr_slope(ltv, age), regr_r2(ltv, age) FROM customers GROUP BY segment HAVING regr_count(ltv, age) >= 30;
Regression only for segments with sufficient data

regr_avgx / regr_avgy

PG 8.0+double precision

Returns the average of the independent (X) or dependent (Y) variable over rows where both values are non-null.

Data EngDeveloperDBA

Signatures

regr_avgx ( Y double precision, X double precision ) → double precision
regr_avgy ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_avgx(revenue, ad_spend), regr_avgy(revenue, ad_spend) FROM campaigns;
Mean ad spend and mean revenue over paired rows
Anti-Pattern

`regr_avgx(Y, X)` is not the same as `avg(X)` — it only includes rows where both Y and X are non-null. Use it when you need means computed over the same paired dataset used for regression.

example
SELECT avg(ad_spend) AS all_avg, regr_avgx(revenue, ad_spend) AS paired_avg FROM campaigns;
May differ if some rows have NULL revenue

regr_sxx / regr_syy / regr_sxy

PG 8.0+double precision

Returns the sum of squares of X deviations, Y deviations, or the cross-product of X and Y deviations. Building blocks for manual regression analysis.

Data EngDeveloperDBA

Signatures

regr_sxx ( Y double precision, X double precision ) → double precision
regr_syy ( Y double precision, X double precision ) → double precision
regr_sxy ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionDependent variable
Xdouble precisionIndependent variable

Examples

sql
SELECT regr_sxy(revenue, ad_spend) / regr_sxx(revenue, ad_spend) AS manual_slope FROM campaigns;
Same as regr_slope() — manual computation
Anti-Pattern

`regr_slope = regr_sxy / regr_sxx` and `corr = regr_sxy / sqrt(regr_sxx * regr_syy)`. Having these building blocks allows you to compute weighted or conditional regression variants in SQL.

example
SELECT regr_sxy(y, x) / nullif(sqrt(regr_sxx(y, x) * regr_syy(y, x)), 0) AS manual_corr FROM data;
Equivalent to corr(y, x)

covar_pop / covar_samp

PG 8.0+double precision

Returns the population or sample covariance of the (X, Y) pairs. Covariance measures how two variables change together.

Data EngDeveloperDBA

Signatures

covar_pop ( Y double precision, X double precision ) → double precision
covar_samp ( Y double precision, X double precision ) → double precision

Parameters

ParameterTypeDescription
Ydouble precisionFirst variable
Xdouble precisionSecond variable

Examples

sql
SELECT covar_samp(revenue, ad_spend) FROM campaigns;
Sample covariance between revenue and spend
sql
SELECT covar_pop(height_cm, weight_kg) FROM patients;
Population covariance between height and weight
Anti-Pattern

`corr(y, x) = covar_pop(y, x) / (stddev_pop(y) * stddev_pop(x))`. Covariance alone is hard to interpret because its magnitude depends on the scales of Y and X. Prefer `corr()` for interpretable results.

example
SELECT covar_pop(revenue, spend) / nullif(stddev_pop(revenue) * stddev_pop(spend), 0) AS manual_corr, corr(revenue, spend) AS builtin_corr FROM campaigns;
Same value — two ways to compute correlation

percentile_cont (statistics)

PG 9.4+same as sort type

Computes a continuous percentile: the value at the given fraction of the sorted set, interpolating between adjacent values if necessary. The multi-fraction form returns an array.

Data EngDeveloperDBA

Signatures

percentile_cont ( fraction ) WITHIN GROUP ( ORDER BY sort_expression ) → same as sort_expression
percentile_cont ( fractions anyarray ) WITHIN GROUP ( ORDER BY sort_expression ) → anyarray

Parameters

ParameterTypeDescription
fractiondouble precisionFraction between 0 and 1 (e.g., 0.5 = median)
sort_expressionnumeric or comparable typeValues to compute percentile over

Examples

sql
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary FROM employees;
Median salary (interpolated)
sql
SELECT percentile_cont(ARRAY[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY response_ms) AS quartiles FROM api_logs;
Q1, median, Q3 response times
Anti-Pattern

Pass an array of fractions to get multiple percentiles in one pass: `percentile_cont(ARRAY[0.1,0.25,0.5,0.75,0.9,0.99]) WITHIN GROUP (ORDER BY latency)` gives a complete distribution summary without scanning the table multiple times.

example
SELECT percentile_cont(ARRAY[0.5, 0.95, 0.99]) WITHIN GROUP (ORDER BY response_ms) AS p50_p95_p99 FROM api_requests;
{120, 450, 980} — median, p95, p99 latencies

percentile_disc (statistics)

PG 9.4+same as sort type

Computes a discrete percentile: the first value in sorted order whose cumulative distribution function is >= the given fraction. Returns an actual value from the dataset (no interpolation).

Data EngDeveloperDBA

Signature

percentile_disc ( fraction ) WITHIN GROUP ( ORDER BY sort_expression ) → same as sort_expression

Parameters

ParameterTypeDescription
fractiondouble precisionFraction between 0 and 1
sort_expressionany sortable typeValues to compute percentile over

Examples

sql
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY score) FROM test_results;
The actual middle value (not interpolated)
sql
SELECT percentile_disc(0.9) WITHIN GROUP (ORDER BY page_load_ms) FROM performance_log;
The p90 page load time — an actual observed value
Anti-Pattern

`percentile_disc` works on any sortable type (text, date, enum). `percentile_cont` requires numeric types (it interpolates). Use `percentile_disc` when you need an actual value from the dataset.

example
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY created_at) AS median_signup_date FROM users;
The date that splits users into two equal halves

mode (statistics)

PG 9.4+same as sort type

Returns the most frequent value in the sorted group. If multiple values are equally frequent, returns the smallest.

Data EngDeveloperDBA

Signature

mode () WITHIN GROUP ( ORDER BY sort_expression ) → same as sort_expression

Parameters

ParameterTypeDescription
sort_expressionany sortable typeValues to find mode of

Examples

sql
SELECT mode() WITHIN GROUP (ORDER BY product_id) FROM order_items;
The most-ordered product
sql
SELECT mode() WITHIN GROUP (ORDER BY error_code) FROM error_log WHERE date = current_date;
Most common error code today
Anti-Pattern

Instead of `SELECT val FROM t GROUP BY val ORDER BY count(*) DESC LIMIT 1`, use `mode() WITHIN GROUP (ORDER BY val)` — it's one aggregation step and composes cleanly in GROUP BY queries.

example
SELECT user_id, mode() WITHIN GROUP (ORDER BY category) AS favorite_category FROM purchases GROUP BY user_id;
Each user's most-purchased category

rank (hypothetical)

PG 9.4+bigint

Computes the rank that the hypothetical row formed from args would have in the sorted group. Ties receive the same rank and leave gaps.

Data EngDeveloperDBA

Signature

rank ( args ) WITHIN GROUP ( ORDER BY sort_expression ) → bigint

Parameters

ParameterTypeDescription
argsanyHypothetical values to rank
sort_expressionany sortable typeSorted group to rank against

Examples

sql
SELECT rank(50000) WITHIN GROUP (ORDER BY salary) FROM employees;
Rank of $50,000 salary among all employees
sql
SELECT rank('2025-01-01') WITHIN GROUP (ORDER BY created_at) FROM events;
Rank of a specific date in the events timeline
Anti-Pattern

Use `cume_dist(value) WITHIN GROUP (ORDER BY col)` to answer 'what fraction of values are <= this?'. For '1 in N' framing, use `rank(value) WITHIN GROUP (ORDER BY col)` against the total count.

example
SELECT cume_dist(75) WITHIN GROUP (ORDER BY score) FROM test_results;
Fraction of scores <= 75 (e.g., 0.85 = 85th percentile)

dense_rank (hypothetical)

PG 9.4+bigint

Like rank(), but no gaps are left after tied values — ranks are consecutive integers.

Data EngDeveloperDBA

Signature

dense_rank ( args ) WITHIN GROUP ( ORDER BY sort_expression ) → bigint

Parameters

ParameterTypeDescription
argsanyHypothetical values
sort_expressionany sortable typeSorted group

Examples

sql
SELECT dense_rank(500) WITHIN GROUP (ORDER BY score DESC) FROM leaderboard;
Dense rank of score 500 on the leaderboard
Anti-Pattern

If three rows tie for rank 2, `rank()` assigns 2,2,2 then jumps to 5. `dense_rank()` assigns 2,2,2 then 3. Use `dense_rank` when the count of distinct rank positions matters more than the exact gap.

example
SELECT rank(90) WITHIN GROUP (ORDER BY score DESC) AS with_gaps, dense_rank(90) WITHIN GROUP (ORDER BY score DESC) AS no_gaps FROM scores;
Shows the difference when ties exist above score 90

percent_rank (hypothetical)

PG 9.4+double precision

Computes the relative rank (fraction) of the hypothetical row: (rank - 1) / (N - 1).

Data EngDeveloperDBA

Signature

percent_rank ( args ) WITHIN GROUP ( ORDER BY sort_expression ) → double precision

Parameters

ParameterTypeDescription
argsanyHypothetical values
sort_expressionany sortable typeSorted group

Examples

sql
SELECT percent_rank(75000) WITHIN GROUP (ORDER BY salary) FROM employees;
Fraction of salaries below $75,000
Anti-Pattern

A result of 0.75 means the hypothetical value ranks higher than 75% of the group. Unlike `cume_dist`, `percent_rank` uses (rank-1)/(N-1) so the lowest value always gets 0.

example
SELECT percent_rank(100) WITHIN GROUP (ORDER BY test_score) FROM students;
0.0 if 100 is the minimum score, 1.0 if it's the maximum

cume_dist (hypothetical)

PG 9.4+double precision

Computes the cumulative distribution of the hypothetical row: the fraction of rows with values <= the hypothetical value.

Data EngDeveloperDBA

Signature

cume_dist ( args ) WITHIN GROUP ( ORDER BY sort_expression ) → double precision

Parameters

ParameterTypeDescription
argsanyHypothetical value
sort_expressionany sortable typeDistribution to compare against

Examples

sql
SELECT cume_dist(500) WITHIN GROUP (ORDER BY order_value) FROM orders;
Fraction of orders with value <= 500
Anti-Pattern

Unlike `percent_rank`, `cume_dist` computes rank/N so the minimum value in the group gets 1/N (not 0). Use it when 'what fraction of data is at or below X' is the right question.

example
SELECT cume_dist(now() - interval '30 days') WITHIN GROUP (ORDER BY last_login) FROM users;
Fraction of users who last logged in 30+ days ago