∑

SQLite Aggregate Functions

SQLite

Complete reference for SQLite aggregate functions covering count, sum, avg, max, min, group_concat, string_agg, median, percentile_cont, and percentile_disc. Every function includes syntax, GROUP BY examples, and comparisons with PostgreSQL aggregate functions. Updated for SQLite 3.52.

12 functions

Jump to function

avg count group_concat string_agg max min median percentile percentile_cont percentile_disc sum total

What are SQLite Aggregate Functions?

SQLite aggregate functions compute a single result from multiple input rows and are used with GROUP BY clauses to summarise data. group_concat() is SQLite's equivalent of PostgreSQL's string_agg() for concatenating grouped strings. SQLite 3.44.0 added string_agg() as an alias. median(), percentile_cont(), and percentile_disc() are available via loadable extensions or in some pre-compiled SQLite distributions.

avg

Available in all SQLite versions.→ REAL

Returns the average value of all non-NULL values of X within a group. String and BLOB values that do not look like numbers are interpreted as 0. The result is always a floating-point value when at least one non-NULL input exists. Returns NULL if there are no non-NULL inputs.

Signatures

avg(X)

avg(DISTINCT X)

Parameters

Parameter	Type	Description
X	numeric	The value to average. NULL values are ignored. Non-numeric strings and BLOBs are treated as 0.

Examples

Average price of all products

sql

SELECT avg(price) FROM products;

→24.99

Average price per category

sql

SELECT category, avg(price) FROM products GROUP BY category;

→Electronics: 349.50, Books: 19.99

Average test score for a specific student

sql

SELECT avg(score) FROM test_results WHERE student_id = 42;

→87.5

Average of distinct salaries only

sql

SELECT avg(DISTINCT salary) FROM employees WHERE dept = 'Engineering';

→95000.0

Average API response time over the last 7 days

sql

SELECT avg(response_ms) FROM api_logs WHERE endpoint = '/api/search' AND created_at >= date('now','-7 days');

→142.3

⚠Anti-Pattern— Mixing NULL and 0 changes the denominator silently

When a column contains a mix of NULL and 0, avg() excludes NULLs from the count of rows used as the denominator, but includes rows where the value is 0. This means avg(score) over [90, 0, NULL] returns 45.0 (two rows), not 30.0 (three rows). Developers often expect NULLs to be treated as zero, leading to an inflated average. The silent coercion of non-numeric strings to 0 compounds this: a stray empty string lowers the average without any error.

✓ Instead: Use COALESCE before aggregating when zero and NULL should be treated identically: avg(COALESCE(score, 0)).

If no rows match your WHERE clause, avg() returns NULL, not 0. Use COALESCE(avg(X), 0) when a zero default is more useful than NULL — for example when displaying metrics in a dashboard where NULL would break downstream arithmetic.

example

SELECT dept, round(avg(salary), 2) AS avg_salary FROM employees GROUP BY dept HAVING count(*) >= 5;

→Average salary per department with at least 5 employees

◆SQLite Note

In both SQLite and PostgreSQL, avg() ignores NULLs and returns NULL when all inputs are NULL (empty group). The key SQLite-specific behavior is that non-numeric strings are silently coerced to 0 rather than raising an error, which PostgreSQL would reject with a type error. PostgreSQL's avg() preserves exact numeric precision for integer inputs (returning NUMERIC), whereas SQLite always returns REAL (IEEE 754 float).

⚙ Availability: Built-in, all SQLite versions

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

count

Available in all SQLite versions.→ INTEGER

count(*) returns the total number of rows in the group, including rows with NULL values. count(X) returns the number of times X is not NULL in the group. count(DISTINCT X) counts distinct non-NULL values of X.

Signatures

count(*)

count(X)

count(DISTINCT X)

Parameters

Parameter	Type	Description
X	any	The expression to count non-NULL occurrences of. Use * to count all rows regardless of NULLs.

Examples

Total number of pending orders

sql

SELECT count(*) FROM orders WHERE status = 'pending';

→142

Number of users who have an email address (NULLs excluded)

sql

SELECT count(email) FROM users;

→980

Number of distinct countries represented

sql

SELECT count(DISTINCT country) FROM customers;

→47

Top 10 most frequently purchased products

sql

SELECT product_id, count(*) AS purchase_count FROM order_items GROUP BY product_id ORDER BY purchase_count DESC LIMIT 10;

→product_id: 88, purchase_count: 542

Split count using FILTER clauses

sql

SELECT count(*) FILTER (WHERE completed_at IS NOT NULL) AS done, count(*) FILTER (WHERE completed_at IS NULL) AS pending FROM tasks;

→done: 312, pending: 58

⚠Anti-Pattern— Using count(column) when the goal is to count all rows

A common mistake is writing count(id) or count(some_column) expecting it to equal count(*), then being surprised when the result is lower. If that column contains any NULLs — even for a column that should never be NULL but is due to a data quality issue — the count will be silently understated. This is especially deceptive on LEFT JOIN results where the right-side columns are NULL for unmatched rows.

✓ Instead: Use count(*) to count rows unconditionally, and reserve count(column) only when you specifically want to count non-NULL occurrences of that column.

count(*) counts every row in the group, even those where every column is NULL. count(column) only counts rows where that specific column is non-NULL. Use count(*) to tally rows, and count(optional_column) to tally how many rows filled in an optional field. count(DISTINCT col) is useful for cardinality estimates (e.g., 'how many unique visitors?').

example

SELECT date(created_at) AS day, count(*) AS orders, count(DISTINCT customer_id) AS unique_customers FROM orders GROUP BY day;

→Daily order count alongside distinct customer count in one pass

◆SQLite Note

Behavior is identical between SQLite and PostgreSQL. Both count(*) count all rows, count(X) ignores NULLs, and count(DISTINCT X) counts distinct non-NULL values. One minor difference: in PostgreSQL you can use count(*) as a window function with OVER(), and SQLite supports the same via its window function syntax.

⚙ Availability: Built-in, all SQLite versions

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

group_concat

group_concat() available since SQLite 3.5.4 (2008). ORDER BY inside the aggregate and the string_agg() alias added in SQLite 3.44.0 (2023-11-01).→ TEXT

Returns a string that is the concatenation of all non-NULL values of X within a group, separated by Y (default: comma). NULL values of X are ignored. The order of concatenation is arbitrary unless an ORDER BY clause is provided. string_agg(X,Y) is an alias for group_concat(X,Y).

Signatures

group_concat(X)

group_concat(X, Y)

group_concat(X ORDER BY ordering-term)

group_concat(X, Y ORDER BY ordering-term)

group_concat(DISTINCT X)

Parameters

Parameter	Type	Description
X	any	The value to concatenate. NULL values are silently skipped.
Y	TEXT	Optional separator string placed between values. Defaults to ',' if omitted.

Examples

Comma-separated tag names for a post

sql

SELECT group_concat(name) FROM tags WHERE post_id = 5;

→'sqlite,database,performance'

Custom pipe separator

sql

SELECT group_concat(name, ' | ') FROM tags WHERE post_id = 5;

→'sqlite | database | performance'

Skills sorted alphabetically per user (requires SQLite 3.44.0+)

sql

SELECT user_id, group_concat(skill ORDER BY skill ASC) AS skills FROM user_skills GROUP BY user_id;

→user_id: 1, skills: 'Go,Python,Rust'

Distinct categories only

sql

SELECT group_concat(DISTINCT category) FROM products;

→'Books,Electronics,Clothing'

Employee roster per department

sql

SELECT dept_id, group_concat(last_name, ', ') FROM employees GROUP BY dept_id ORDER BY dept_id;

→dept_id: 3, 'Smith, Jones, Patel'

⚠Anti-Pattern— Expecting stable concatenation order without ORDER BY

Developers often write group_concat(tag) and rely on the output order matching insertion order or alphabetical order, then build application logic (CSV exports, display lists, diff comparisons) that breaks whenever SQLite changes its internal scan order. This order is an implementation detail and is not guaranteed to be stable across queries, vacuums, or SQLite versions. The ORDER BY clause inside the aggregate was only added in SQLite 3.44.0, so older deployments have no portable way to guarantee order.

✓ Instead: Always specify group_concat(col ORDER BY col ASC) on SQLite 3.44.0+ to get deterministic output, or sort the result in application code if you must support older versions.

Without ORDER BY, the concatenation order is undefined and may change between queries or SQLite versions. Since SQLite 3.44.0 you can write group_concat(col ORDER BY col ASC) to guarantee consistent, sorted output. This is especially important when generating reproducible CSV exports or building delimited lists that will be compared across runs.

example

SELECT post_id, group_concat(tag ORDER BY tag ASC) AS sorted_tags FROM post_tags GROUP BY post_id;

→'database,performance,sqlite' — deterministic alphabetical order

◆SQLite Note

In PostgreSQL the primary name is string_agg(X, Y) and group_concat does not exist. In SQLite the primary name is group_concat and string_agg is an alias added in version 3.44.0. Additionally, PostgreSQL's string_agg requires a separator argument (Y is mandatory), while SQLite's group_concat defaults the separator to a comma when Y is omitted. The ORDER BY clause inside the aggregate is supported in both dialects, but was only added to SQLite in version 3.44.0.

⚙ Availability: Built-in, all SQLite versions; ORDER BY inside aggregate requires SQLite 3.44.0+

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

UPDATE ... FROM with multiple matching rows produces non-deterministic results

When an UPDATE ... FROM join matches multiple rows from the FROM table, PostgreSQL uses one arbitrarily — no error.

string_agg

string_agg() added as an alias for group_concat() in SQLite 3.44.0 (2023-11-01). Use group_concat() for compatibility with older SQLite versions.→ TEXT

An alias for group_concat(X, Y). Returns a string that is the concatenation of all non-NULL values of X separated by Y. The separator Y is required. The order of concatenation is arbitrary unless an ORDER BY clause is specified. Compatible with PostgreSQL and SQL Server naming.

Signatures

string_agg(X, Y)

string_agg(X, Y ORDER BY ordering-term)

Parameters

Parameter	Type	Description
X	any	The value to concatenate. NULL values are silently skipped.
Y	TEXT	Separator string placed between concatenated values. Required (unlike group_concat where it defaults to comma).

Examples

Comma-space separated category names

sql

SELECT string_agg(name, ', ') FROM categories;

→'Books, Electronics, Clothing'

Space-separated tags per post

sql

SELECT post_id, string_agg(tag, ' ') FROM post_tags GROUP BY post_id;

→post_id: 7, 'sqlite performance tuning'

Semicolon-delimited email list for bulk sending

sql

SELECT string_agg(email, ';') FROM newsletter_subscribers WHERE active = 1;

→'a@x.com;b@y.com;c@z.com'

Employees listed in hire-date order per department

sql

SELECT dept, string_agg(name, ', ' ORDER BY hire_date ASC) FROM employees GROUP BY dept;

→dept: 'Eng', 'Alice, Bob, Carol'

Build a comma-delimited ID list for an IN clause

sql

SELECT string_agg(CAST(id AS TEXT), ',') FROM flagged_users WHERE reviewed = 0;

→'12,34,99,102'

⚠Anti-Pattern— Using string_agg on SQLite versions older than 3.44.0

string_agg is not a built-in function before SQLite 3.44.0, so calling it on an older runtime produces an error: 'no such function: string_agg'. This is easy to miss because many Linux distributions and embedded deployments ship SQLite versions from 2018–2022 that predate the alias. Code tested on a modern dev machine silently breaks in production when the SQLite version differs.

✓ Instead: Check your runtime SQLite version with SELECT sqlite_version(); and use group_concat(X, Y) for compatibility with SQLite older than 3.44.0.

If you write SQL that must run on both SQLite (3.44.0+) and PostgreSQL, prefer string_agg(X, Y) over group_concat(X, Y). The separator argument is required in string_agg(), matching PostgreSQL's requirement, which avoids the subtle difference where group_concat() silently defaults to a comma. Always supply an explicit separator to avoid surprises.

example

SELECT dept, string_agg(name, ', ' ORDER BY hire_date ASC) AS team_roster FROM employees GROUP BY dept;

→Comma-separated roster of each department in seniority order

◆SQLite Note

string_agg(X,Y) is an alias for group_concat(X,Y) added in SQLite 3.44.0 (2023-11-01) specifically for PostgreSQL and SQL Server compatibility. In PostgreSQL, string_agg is the canonical name and group_concat does not exist. The behavior is functionally identical once the alias was added, but note that SQLite's group_concat allows omitting the separator (defaulting to comma), while PostgreSQL's string_agg requires the separator — string_agg in SQLite also requires Y.

⚙ Availability: Requires SQLite 3.44.0 or later.

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

UPDATE ... FROM with multiple matching rows produces non-deterministic results

When an UPDATE ... FROM join matches multiple rows from the FROM table, PostgreSQL uses one arbitrarily — no error.

max

Available in all SQLite versions.→ same as X

As an aggregate function, returns the maximum value of all non-NULL values of X in the group. The maximum is determined by SQLite's type ordering rules (the value that would sort last in an ORDER BY on the same column). Returns NULL if and only if there are no non-NULL values in the group. max() also has a scalar form that accepts two or more arguments.

Signatures

max(X)

max(DISTINCT X)

Parameters

Parameter	Type	Description
X	any	The value to find the maximum of. NULL values are ignored. Type affinity rules determine comparison order.

Examples

Highest salary across all employees

sql

SELECT max(salary) FROM employees;

→185000

Highest salary per department

sql

SELECT dept_id, max(salary) FROM employees GROUP BY dept_id;

→dept_id: 2, max_salary: 142000

Most recent order date for a customer

sql

SELECT max(created_at) FROM orders WHERE customer_id = 7;

→'2024-11-28 14:32:01'

Peak inventory level per product

sql

SELECT product_id, max(quantity) FROM inventory_snapshots GROUP BY product_id;

→product_id: 55, max_qty: 2400

Latest migration applied (lexicographic max of version strings)

sql

SELECT max(version) FROM schema_migrations;

→'20241105_add_indexes'

⚠Anti-Pattern— Using max() expecting to retrieve the full row of the maximum value

A frequent mistake is writing SELECT id, name, max(score) FROM results expecting the id and name to correspond to the row with the highest score. In standard SQL this is undefined behavior — SQLite may return any id and name from the group, not necessarily from the max row. This produces silently wrong data that is easy to miss in testing when the max row happens to be returned by chance.

✓ Instead: Use a subquery or window function to fetch the full row: SELECT * FROM results WHERE score = (SELECT max(score) FROM results), or use ROW_NUMBER() OVER (ORDER BY score DESC) and filter where rn = 1.

When your WHERE clause matches zero rows, max() returns NULL rather than raising an error. Use COALESCE(max(col), fallback) to substitute a safe default. Also remember that max() in SQLite follows type affinity: comparing TEXT '9' against INTEGER 10 may yield surprising results because '9' > 10 as text. Ensure your column has the correct affinity or CAST explicitly.

example

SELECT customer_id, max(order_date) AS last_order, count(*) AS total_orders FROM orders GROUP BY customer_id HAVING max(order_date) < date('now','-180 days');

→Customers who haven't ordered in 6 months — potential churn

◆SQLite Note

Aggregate max() behaves the same in SQLite and PostgreSQL: ignores NULLs, returns NULL on empty input. The scalar multi-argument form max(a, b, c) exists in SQLite as a core scalar function. PostgreSQL does not have a built-in scalar max() accepting multiple arguments — it uses the GREATEST() function instead. For ordering, SQLite uses its type affinity rules (NULL < INTEGER/REAL < TEXT < BLOB) while PostgreSQL uses strict type-based comparison with explicit casting required across types.

⚙ Availability: Built-in, all SQLite versions

Common Gotchas

⚠

NULL kills comparisons silently — nothing equals NULL, not even NULL

NULL = NULL is not TRUE. It is NULL. Queries filtering for NULL with = will silently return zero rows.

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

min

Available in all SQLite versions.→ same as X

As an aggregate function, returns the minimum non-NULL value of all values in the group. The minimum is the first non-NULL value that would appear in an ORDER BY on the same column. Returns NULL if and only if there are no non-NULL values in the group. min() also has a scalar form that accepts two or more arguments.

Signatures

min(X)

min(DISTINCT X)

Parameters

Parameter	Type	Description
X	any	The value to find the minimum of. NULL values are ignored. Type affinity rules determine comparison order.

Examples

Cheapest electronics product

sql

SELECT min(price) FROM products WHERE category = 'Electronics';

→9.99

First purchase date per customer

sql

SELECT customer_id, min(order_date) AS first_order FROM orders GROUP BY customer_id;

→customer_id: 101, first_order: '2022-03-14'

Fastest recorded response time for an endpoint

sql

SELECT min(response_ms) FROM api_logs WHERE endpoint = '/api/users';

→12

Salary range per department

sql

SELECT dept_id, min(salary), max(salary) FROM employees GROUP BY dept_id;

→dept_id: 3, min: 55000, max: 142000

Minimum stock across active products, only if some are critically low

sql

SELECT min(stock_quantity) FROM products WHERE active = 1 HAVING min(stock_quantity) < 10;

→3

⚠Anti-Pattern— Selecting other columns alongside min() expecting them to match the minimum row

Writing SELECT customer_id, min(order_date), order_total FROM orders GROUP BY customer_id assumes order_total will come from the row with the earliest order_date. It will not — SQLite picks order_total from an arbitrary row in the group, not the row where the min was found. This is a classic aggregation pitfall that produces quietly incorrect reports.

✓ Instead: Retrieve the full row by joining back on the minimum value: SELECT o.* FROM orders o JOIN (SELECT customer_id, min(order_date) AS first_date FROM orders GROUP BY customer_id) m ON o.customer_id = m.customer_id AND o.order_date = m.first_date.

A common pattern is finding each user's or customer's first event. SELECT user_id, min(event_time) FROM events GROUP BY user_id is simpler and often faster than a correlated subquery or window function approach in SQLite, because SQLite can use an index on (user_id, event_time) to satisfy the group-wise min efficiently.

example

SELECT product_id, min(sale_price) AS floor_price, max(sale_price) AS ceiling_price FROM price_history GROUP BY product_id;

→Price range per product across all historical records

◆SQLite Note

Aggregate min() behaves identically in SQLite and PostgreSQL: NULLs are ignored, and NULL is returned when all values are NULL or the group is empty. As with max(), SQLite provides a scalar multi-argument min(a, b, c) for finding the smallest of a set of values; PostgreSQL uses the LEAST() function for the same purpose. Type affinity differences apply the same way as for max().

⚙ Availability: Built-in, all SQLite versions

Common Gotchas

⚠

NULL kills comparisons silently — nothing equals NULL, not even NULL

NULL = NULL is not TRUE. It is NULL. Queries filtering for NULL with = will silently return zero rows.

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

median

median() added in SQLite 3.51.0 (2025-11-04) when compiled with -DSQLITE_ENABLE_PERCENTILE. Available as a loadable extension (ext/misc/percentile.c) in earlier versions.→ REAL

Returns the median (middle value) of all non-NULL values of X within a group. Equivalent to percentile_cont(X, 0.5). When the group has an even number of values, the result is the arithmetic average of the two middle values (continuous interpolation). NULL values are ignored. Non-numeric non-NULL values raise an error.

Signature

median(X)

Parameters

Parameter	Type	Description
X	numeric	The numeric value to find the median of. NULL values are ignored. Non-NULL non-numeric values raise an error.

Examples

Median salary across the company

sql

SELECT median(salary) FROM employees;

→72500.0

Median salary per department

sql

SELECT dept_id, median(salary) FROM employees GROUP BY dept_id;

→dept_id: 2, median_salary: 85000.0

Median API response time today (less affected by outliers than avg)

sql

SELECT median(response_ms) FROM api_logs WHERE DATE(created_at) = DATE('now');

→134.5

Median age of German customers

sql

SELECT median(age) FROM customers WHERE country = 'DE';

→34.0

Products where mean and median diverge significantly, indicating price skew

sql

SELECT product_id, avg(price) AS mean_price, median(price) AS median_price FROM order_items GROUP BY product_id HAVING abs(avg(price) - median(price)) > 10;

→product_id: 42, mean: 89.0, median: 24.99

⚠Anti-Pattern— Using avg() as a stand-in for median() when outliers are present

avg() computes the arithmetic mean, which is pulled toward extreme values. In a dataset like [10, 12, 14, 16, 950], avg() returns 200.4 while median() returns 14.0 — a very different picture of the typical value. Developers reach for avg() out of habit or because median() was unavailable before SQLite 3.51.0, then draw incorrect conclusions from skewed distributions such as salaries, response times, or purchase amounts.

✓ Instead: Use median(X) on SQLite 3.51.0+ for a robust central tendency measure; on older versions, approximate with percentile_cont(X, 0.5) via the loadable extension.

The arithmetic mean is heavily influenced by extreme values (outliers). A single unusually large salary or response time can skew avg() considerably. median() always reflects the true middle of the dataset. Comparing avg() and median() side by side is a quick way to detect skew: if they differ significantly, your distribution is not symmetric and the median is usually the more representative central tendency.

example

SELECT dept, median(salary) AS median_salary, avg(salary) AS mean_salary FROM employees GROUP BY dept;

→Compare median vs mean salary per department to detect skew

◆SQLite Note

PostgreSQL does not have a built-in median() aggregate function. The standard PostgreSQL approach is percentile_cont(0.5) WITHIN GROUP (ORDER BY X), which uses the SQL standard ordered-set aggregate syntax. SQLite's median(X) is simpler syntactically (standard aggregate syntax) and is equivalent to its own percentile_cont(X, 0.5). The SQLite form is more concise but is not SQL-standard and will not work in PostgreSQL.

⚙ Availability: Requires SQLite 3.51.0 or later (compiled with -DSQLITE_ENABLE_PERCENTILE), or available as a loadable extension in prior versions.

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Stale statistics after bulk load — planner uses wrong row estimates

After INSERT of millions of rows, the query planner still uses the old row count until ANALYZE runs. This leads to terrible plans.

percentile

percentile() added in SQLite 3.51.0 (2025-11-04) when compiled with -DSQLITE_ENABLE_PERCENTILE. Previously available as a loadable extension (ext/misc/percentile.c).→ REAL

Computes the P-th percentile of all non-NULL values of Y within a group. P must be a constant between 0.0 and 100.0 (inclusive) and must be the same for all rows in the aggregate. Equivalent to percentile_cont(Y, P/100.0). Returns a value that is greater than or equal to P percent of the inputs. Uses continuous interpolation between adjacent values when the exact percentile falls between two data points.

Signature

percentile(Y, P)

Parameters

Parameter	Type	Description
Y	numeric	The numeric value to compute the percentile over. NULL values are ignored. Non-NULL non-numeric values raise an error.
P	REAL	The percentile to compute, in the range 0.0 to 100.0 inclusive. Must be a constant and identical for all rows of the aggregate.

Examples

90th percentile salary (P in 0–100 range)

sql

SELECT percentile(salary, 90) FROM employees;

→135000.0

P95 latency for the search endpoint

sql

SELECT percentile(response_ms, 95) FROM api_logs WHERE endpoint = '/api/search';

→412.0

75th percentile salary per department

sql

SELECT dept_id, percentile(salary, 75) AS p75_salary FROM employees GROUP BY dept_id;

→dept_id: 1, p75_salary: 110000.0

Median and 99th percentile of order totals in the last 30 days

sql

SELECT percentile(order_total, 50) AS median_order, percentile(order_total, 99) AS p99_order FROM orders WHERE created_at >= date('now','-30 days');

→median: 49.99, p99: 899.0

First and third quartiles of exam scores

sql

SELECT percentile(score, 25) AS q1, percentile(score, 75) AS q3 FROM exam_results WHERE exam_id = 10;

→q1: 62.0, q3: 88.0

⚠Anti-Pattern— Passing a 0.0–1.0 fraction to percentile() instead of a 0–100 value

Because percentile_cont uses a 0.0–1.0 scale, developers who switch to percentile() often write percentile(salary, 0.95) intending the 95th percentile. SQLite accepts this without error but computes the 0.95th percentile — effectively the minimum of the dataset — returning a result that looks plausible but is completely wrong. There is no type error to alert you.

✓ Instead: Use the 0–100 scale that percentile() expects: percentile(salary, 95), not percentile(salary, 0.95).

SQLite offers both forms. percentile(salary, 95) and percentile_cont(salary, 0.95) return the same result — choose whichever reads more naturally in your context. The 0–100 scale of percentile() matches how people commonly communicate percentiles ('the 95th percentile'), making it more readable in application code and reports.

example

SELECT percentile(response_ms, 95) AS p95, percentile(response_ms, 99) AS p99 FROM api_logs WHERE endpoint = '/search';

→P95 and P99 latency thresholds for SLA monitoring

◆SQLite Note

PostgreSQL uses the SQL standard ordered-set aggregate syntax: percentile_cont(0.95) WITHIN GROUP (ORDER BY salary). SQLite's percentile(salary, 95) is a simpler custom aggregate that does not follow the SQL standard ordered-set syntax. The P parameter in SQLite's percentile() is on a 0–100 scale, whereas PostgreSQL's percentile_cont uses 0.0–1.0. Both use continuous interpolation between data points. Code cannot be directly ported between the two without syntax changes.

⚙ Availability: Requires SQLite 3.51.0 or later (compiled with -DSQLITE_ENABLE_PERCENTILE), or available as a loadable extension in prior versions.

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Stale statistics after bulk load — planner uses wrong row estimates

After INSERT of millions of rows, the query planner still uses the old row count until ANALYZE runs. This leads to terrible plans.

percentile_cont

percentile_cont() added in SQLite 3.51.0 (2025-11-04) when compiled with -DSQLITE_ENABLE_PERCENTILE. Previously available as a loadable extension (ext/misc/percentile.c).→ REAL

Computes the P-th percentile of all non-NULL values of Y within a group using continuous interpolation. P must be a constant between 0.0 and 1.0 (inclusive) and must be the same value for all rows in the aggregate. When the exact percentile falls between two data points, returns the weighted average of those two adjacent values. Equivalent to percentile(Y, P*100).

Signature

percentile_cont(Y, P)

Parameters

Parameter	Type	Description
Y	numeric	The numeric value to compute the percentile over. NULL values are ignored. Non-NULL non-numeric values raise an error.
P	REAL	The fraction of data points to fall below the result, in the range 0.0 to 1.0 inclusive. Must be constant and identical for all rows of the aggregate.

Examples

Median salary (P=0.5 is the 50th percentile)

sql

SELECT percentile_cont(salary, 0.5) FROM employees;

→72500.0

P99 response time today

sql

SELECT percentile_cont(response_ms, 0.99) FROM api_logs WHERE DATE(logged_at) = DATE('now');

→820.0

Interquartile range per department

sql

SELECT dept_id, percentile_cont(salary, 0.25) AS q1, percentile_cont(salary, 0.75) AS q3 FROM employees GROUP BY dept_id;

→dept_id: 1, q1: 65000.0, q3: 115000.0

90th percentile CPU load over the last hour for capacity planning

sql

SELECT percentile_cont(load_average, 0.9) FROM server_metrics WHERE server_id = 7 AND recorded_at >= datetime('now','-1 hour');

→0.87

Median price per product category, sorted highest to lowest

sql

SELECT item_category, percentile_cont(unit_price, 0.5) AS median_price FROM catalog_items GROUP BY item_category ORDER BY median_price DESC;

→category: 'Laptops', median_price: 849.0

⚠Anti-Pattern— Confusing percentile_cont (interpolated) with percentile_disc (actual value) for discrete domains

percentile_cont interpolates between data points, so on a 1–5 integer rating scale it can return 3.5 — a value that never appears in the data. Developers expecting a real observed value are surprised when the result cannot be matched back to any row. This is particularly misleading for integer identifiers, whole-number scores, or timestamps where fractional values are meaningless.

✓ Instead: Use percentile_disc(Y, P) when the result must be a value that actually exists in your dataset, such as integer ratings, IDs, or timestamps.

percentile_cont(Y, 0.5) on values [1, 2, 3, 4] returns 2.5 (the interpolated midpoint), not 2 or 3. If you need the result to be an actual value from your dataset (for example, to return a real order ID or timestamp), use percentile_disc() instead, which always returns the smaller of the two boundary values.

example

SELECT percentile_cont(load_avg, 0.95) AS p95_load FROM server_metrics WHERE server_id = 3 AND recorded_at >= datetime('now','-1 hour');

→95th percentile CPU load over the last hour

◆SQLite Note

PostgreSQL's percentile_cont uses the SQL standard ordered-set aggregate syntax: percentile_cont(0.5) WITHIN GROUP (ORDER BY salary). SQLite's percentile_cont(salary, 0.5) reverses the argument order (data column first, fraction second) and does not use the WITHIN GROUP syntax. Both produce identical numeric results for the same inputs. The P range is 0.0–1.0 in both SQLite's percentile_cont and PostgreSQL's version.

⚙ Availability: Requires SQLite 3.51.0 or later (compiled with -DSQLITE_ENABLE_PERCENTILE), or available as a loadable extension in prior versions.

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Stale statistics after bulk load — planner uses wrong row estimates

After INSERT of millions of rows, the query planner still uses the old row count until ANALYZE runs. This leads to terrible plans.

percentile_disc

percentile_disc() added in SQLite 3.51.0 (2025-11-04) when compiled with -DSQLITE_ENABLE_PERCENTILE. Previously available as a loadable extension (ext/misc/percentile.c).→ same as Y

Computes the discrete P-th percentile of all non-NULL values of Y within a group. Like percentile_cont() but always returns an actual value from the input dataset rather than interpolating between values. When the exact percentile falls between two data points, returns the smaller of the two. P must be between 0.0 and 1.0 inclusive and must be constant across all rows.

Signature

percentile_disc(Y, P)

Parameters

Parameter	Type	Description
Y	numeric	The numeric value to compute the percentile over. NULL values are ignored. Non-NULL non-numeric values raise an error.
P	REAL	The fraction of data points to fall below the result, in the range 0.0 to 1.0 inclusive. Must be constant and identical for all rows of the aggregate.

Examples

Median score returned as an actual score from the dataset (not interpolated)

sql

SELECT percentile_disc(score, 0.5) FROM exam_results WHERE exam_id = 3;

→78

P95 latency as a real observed value (useful for SLA thresholds)

sql

SELECT percentile_disc(response_ms, 0.95) FROM api_logs;

→389

Median salary per department — always an actual salary in the data

sql

SELECT dept_id, percentile_disc(salary, 0.5) AS median_salary FROM employees GROUP BY dept_id;

→dept_id: 2, median_salary: 84000

Third-quartile order value for completed 2024 orders

sql

SELECT percentile_disc(order_total, 0.75) FROM orders WHERE status = 'completed' AND strftime('%Y', created_at) = '2024';

→124.99

Comparing continuous vs discrete P90 — disc always lands on a real play duration

sql

SELECT percentile_cont(duration_sec, 0.9) AS p90_cont, percentile_disc(duration_sec, 0.9) AS p90_disc FROM video_plays;

→p90_cont: 3847.5, p90_disc: 3820

⚠Anti-Pattern— Using percentile_disc(Y, 0.5) instead of the clearer median() for the 50th percentile

percentile_disc(score, 0.5) and median(score) both compute the 50th percentile, but percentile_disc returns the lower of the two middle values for even-sized groups while median() interpolates between them, giving different results. Using percentile_disc(0.5) when you mean 'middle value' also obscures intent and can confuse reviewers who expect median() for that semantic. On datasets with an even number of rows the two functions produce different numbers without any warning.

✓ Instead: Use median() when you want the standard interpolated 50th percentile; use percentile_disc(Y, 0.5) only when you explicitly need the lower boundary (discrete) behavior for even-sized groups.

percentile_disc() guarantees the result is a value that actually exists in your dataset. This matters when the percentile will be used to look up a specific record (e.g., 'find the order whose total is at the 90th percentile'), or when the domain is discrete and interpolation is meaningless (e.g., whole-number ratings from 1–5). For continuous measures like latency or salary where fractional values are meaningful, prefer percentile_cont().

example

SELECT percentile_disc(order_total, 0.9) AS p90_order FROM orders WHERE status = 'completed';

→The actual order total at the 90th percentile — always a real value from the data

◆SQLite Note

PostgreSQL's percentile_disc uses ordered-set aggregate syntax: percentile_disc(0.5) WITHIN GROUP (ORDER BY salary). SQLite's percentile_disc(salary, 0.5) places the data column first. Both return the smaller of the two boundary values when the percentile falls between data points. The return type of percentile_disc matches the input type (preserving INTEGER if input is INTEGER), whereas percentile_cont always returns REAL due to interpolation.

⚙ Availability: Requires SQLite 3.51.0 or later (compiled with -DSQLITE_ENABLE_PERCENTILE), or available as a loadable extension in prior versions.

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

⚠

Stale statistics after bulk load — planner uses wrong row estimates

After INSERT of millions of rows, the query planner still uses the old row count until ANALYZE runs. This leads to terrible plans.

sum

Available in all SQLite versions.→ INTEGER or REAL

Returns the sum of all non-NULL values of X in the group. Returns NULL if there are no non-NULL input rows (not 0 — use total() for a zero default). Returns an INTEGER if all non-NULL inputs are integers; returns REAL if any input is non-integer. Raises an 'integer overflow' exception if all inputs are integers and overflow occurs during computation.

Signatures

sum(X)

sum(DISTINCT X)

Parameters

Parameter	Type	Description
X	numeric	The value to sum. NULL values are ignored. Non-numeric strings are treated as 0.

Examples

Total items in order 1001

sql

SELECT sum(quantity) FROM order_items WHERE order_id = 1001;

→7

Total credit transactions for an account

sql

SELECT sum(amount) FROM transactions WHERE account_id = 55 AND type = 'credit';

→4320.50

Top 5 best-selling products by quantity

sql

SELECT product_id, sum(quantity) AS total_sold FROM order_items GROUP BY product_id ORDER BY total_sold DESC LIMIT 5;

→product_id: 12, total_sold: 8402

Sum of distinct discount percentages in active promotions

sql

SELECT sum(DISTINCT discount_pct) FROM promotions WHERE active = 1;

→65

COALESCE converts NULL to 0 when no rows match — total() would do this automatically

sql

SELECT COALESCE(sum(revenue), 0) AS total_revenue FROM sales WHERE region = 'APAC' AND strftime('%Y-%m', sale_date) = '2024-02';

→0

⚠Anti-Pattern— Expecting sum() to return 0 when no rows match

When a WHERE clause matches no rows, sum() returns NULL, not 0. This NULL then silently propagates through downstream arithmetic — a balance calculation like sum(credits) - sum(debits) yields NULL instead of 0 when there are no debits, corrupting totals that are displayed in UIs or written to ledgers. The bug often hides in testing because test data always has at least one matching row.

✓ Instead: Use total(X) which always returns 0.0 for empty sets, or wrap with COALESCE(sum(X), 0) when a zero result is the correct semantic for an empty group.

The most common source of unexpected NULLs in aggregation is sum() over a filtered result set that matches no rows. This is SQL-standard behavior. If a zero result is more useful than NULL in your application (e.g., computing a running balance or displaying a metric), either switch to total(X) which always returns 0.0 for empty sets, or wrap with COALESCE(sum(X), 0). Note that sum() preserves integer type while total() always returns REAL.

example

SELECT account_id, sum(amount) FILTER (WHERE type='credit') AS credits, sum(amount) FILTER (WHERE type='debit') AS debits FROM ledger GROUP BY account_id;

→Running credit and debit totals per account in one query

◆SQLite Note

In PostgreSQL, sum() also returns NULL on an empty set and follows SQL standard behavior. Unlike PostgreSQL, SQLite's sum() silently coerces non-numeric strings to 0 instead of raising a type error. PostgreSQL preserves exact precision for INTEGER and NUMERIC inputs (using NUMERIC arithmetic), whereas SQLite switches to IEEE 754 REAL as soon as any input is non-integer, which can introduce floating-point imprecision. PostgreSQL does not have a total() equivalent — COALESCE(sum(x), 0) is the idiomatic pattern there.

⚙ Availability: Built-in, all SQLite versions

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

total

total() is a long-standing SQLite extension, available in all modern SQLite versions. It is not part of standard SQL and does not exist in PostgreSQL or most other databases.→ REAL

Returns the sum of all non-NULL values of X in the group. Unlike sum(), total() always returns 0.0 (never NULL) when there are no non-NULL input rows. The result of total() is always a floating-point (REAL) value, even if all inputs are integers. total() never raises an integer overflow exception.

Signature

total(X)

Parameters

Parameter	Type	Description
X	numeric	The value to sum. NULL values are ignored. Non-numeric strings are treated as 0.

Examples

Returns 0.0 even if no items have discounts (NULLs or empty set), unlike sum() which would return NULL

sql

SELECT total(discount_amount) FROM order_items WHERE order_id = 500;

→0.0

Total units sold per product — always a float

sql

SELECT product_id, total(quantity) AS units_sold FROM order_items GROUP BY product_id;

→product_id: 7, units_sold: 1502.0

Shopping cart total — 0.0 for an empty cart, never NULL

sql

SELECT total(line_total) AS cart_value FROM cart_items WHERE session_id = 'abc123';

→149.97

Loyalty point balance — arithmetic safe because total() never returns NULL

sql

SELECT user_id, total(points_earned) - total(points_spent) AS balance FROM loyalty_events GROUP BY user_id;

→user_id: 9, balance: 350.0

Total storage used in MB — total() ensures 0.0 for users with no uploads

sql

SELECT total(file_size_bytes) / (1024.0 * 1024.0) AS total_mb FROM uploaded_files WHERE user_id = 12;

→234.7

⚠Anti-Pattern— Using total() for financial calculations that require exact integer arithmetic

total() always returns REAL (IEEE 754 double), which introduces floating-point rounding errors on large integer sums. For example, total() over millions of integer cent values may return 100000000.00000001 instead of 100000000. Financial applications that store amounts as integer cents and expect exact results will silently accumulate rounding error when using total(), and the discrepancy can grow as the dataset grows.

✓ Instead: Use sum() when inputs are integers and exact integer arithmetic is required; sum() preserves the INTEGER type and raises an overflow error rather than silently losing precision.

total() solves the common annoyance of sum() returning NULL on empty input. It is particularly useful in expressions where NULL would propagate and ruin downstream arithmetic (e.g., computing a balance as total(credits) - total(debits)). Note that total() always returns REAL, so if you need an integer result you'll need to CAST. Also note that total() is not standard SQL and does not exist in PostgreSQL — use COALESCE(sum(x), 0) for portability.

example

SELECT user_id, total(points_earned) - total(points_spent) AS balance FROM loyalty_events GROUP BY user_id;

→Net loyalty point balance — never NULL even for users with no spend events

◆SQLite Note

total() is a SQLite-specific function with no equivalent in PostgreSQL. It was added to work around the SQL-standard behavior of sum() returning NULL on empty input. In PostgreSQL, the standard pattern is COALESCE(sum(x), 0). Additionally, total() always returns REAL in SQLite, which means it avoids integer overflow exceptions that sum() can raise. PostgreSQL's sum() over integer columns returns exact BIGINT or NUMERIC, preserving precision that total()'s REAL return type cannot guarantee for very large integers.

⚙ Availability: Built-in, all SQLite versions

Common Gotchas

⚠

string_agg without ORDER BY produces non-deterministic results

string_agg(col, ',') returns results in arbitrary order unless you specify ORDER BY inside the aggregate.

⚠

COUNT(col) skips NULLs but COUNT(*) does not

COUNT(column_name) counts non-NULL values only. COUNT(*) counts all rows. These produce different results when the column has NULLs.

Related SQLite Categories

Window Functions →Core Functions →JSON Functions →