approx_count_distinct
aggregate function
Applies to: Databricks SQL Databricks Runtime
Returns the estimated number of distinct values in expr
within the group.
The implementation uses the dense version of the HyperLogLog++ (HLL++) algorithm, a state of the art cardinality estimation algorithm.
Results are accurate within a default value of 5%, which derives from the value
of the maximum relative standard deviation, although this is configurable with
the relativeSD
parameter as mentioned below.
Syntax
approx_count_distinct(expr[, relativeSD]) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.