corr
PG 8.0+→ double precisionReturns the Pearson correlation coefficient between two sets of values. Returns a value from -1 (perfect negative correlation) to 1 (perfect positive correlation).
Signature
corr ( Y double precision, X double precision ) → double precisionParameters
| Parameter | Type | Description |
|---|---|---|
| Y | double precision | Dependent variable values |
| X | double precision | Independent variable values |
Examples
SELECT corr(revenue, ad_spend) FROM campaigns;Correlation between ad spend and revenueSELECT corr(price, quantity_sold) FROM sales_data;Price-demand correlation (expected negative)SELECT corr(page_views, signups) AS pv_signup_corr FROM daily_metrics WHERE date >= CURRENT_DATE - 90;90-day correlation between page views and signupsSELECT corr(load_avg, response_time) AS load_vs_latency, corr(connection_count, response_time) AS conn_vs_latency FROM server_metrics;Scan multiple causes of latency in one query`corr()` returns NULL when either variable has zero standard deviation (all values identical) — failing to guard against this produces silent NULLs in dashboards; use `NULLIF(stddev_pop(x), 0)` checks before relying on the result. Treating a high Pearson correlation as evidence of a linear relationship without inspecting the data — a non-linear relationship or outliers can produce misleadingly high or low `corr()` values. Also: Passing arguments in the wrong order: `corr(X, Y)` vs `corr(Y, X)` — the result is the same numerically (Pearson is symmetric) but the convention `corr(Y, X)` matches the regression functions `regr_slope(Y, X)`; mixing conventions causes confusion. Also: Using `corr()` on time-series data without accounting for autocorrelation — two independently trending series will show high correlation even with no real relationship (spurious correlation).
✓ Instead: See description and related functions for the correct approach.
Use `corr()` to quickly scan for relationships between columns. A correlation near 0 means no linear relationship. Values above 0.7 or below -0.7 usually warrant further investigation.
SELECT corr(load_avg, response_time) AS load_vs_latency, corr(connection_count, response_time) AS conn_vs_latency FROM server_metrics;Scan multiple potential causes of latency in one queryCommon Gotchas
string_agg without ORDER BY produces non-deterministic results
string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.
COUNT(col) skips NULLs but COUNT(*) does not
COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.
Stale statistics after bulk load — planner uses wrong row estimates
After INSERT of millions of rows, the query planner still uses the old row count until ANALYZE runs. This leads to terrible plans.