approx_percentile
aggregate function
Applies to: Databricks SQL Databricks Runtime
Returns the approximate percentile of the expr
within the group.
Syntax
approx_percentile ( [ALL | DISTINCT] expr, percentile [, accuracy] ) [ FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
Arguments
expr
: A numeric expression.percentile
: A numeric literal between 0 and 1 or a literal array of numeric values, each between 0 and 1.accuracy
: AnINTEGER
literal greater than 0. If accuracy is omitted it is set to 10000.cond
: An optional boolean expression filtering the rows used for aggregation.
Returns
The aggregate function returns the expression that is the smallest value in the ordered group (sorted from least to greatest)
such that no more than percentile
of expr
values is less than the value or equal to that value.
If percentile
is an array, approx_percentile
returns the approximate percentile array of expr
at percentile
.
The accuracy
parameter controls approximation accuracy at the cost of memory.
A higher value of accuracy yields better accuracy, 1.0/accuracy
is the relative error of the approximation.
This function is a synonym for percentile_approx aggregate function.
If DISTINCT
is specified the function operates only on a unique set of expr
values.
Examples
> SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
[1,1,0]
> SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (6), (7), (9), (10) AS tab(col);
6
> SELECT approx_percentile(DISTINCT col, 0.5, 100) FROM VALUES (0), (6), (6), (7), (9), (10) AS tab(col);
7