collect_set aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Returns an array consisting of all unique values in expr within the group.

Syntax

collect_set(expr) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: An expression of any type except MAP.

  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

An ARRAY of the argument type.

The order of elements in the array is non-deterministic. NULL values are excluded.

Examples

> SELECT collect_set(col) FROM VALUES (1), (2), (NULL), (1) AS tab(col);
 [1,2]

> SELECT collect_set(col1) FILTER(WHERE col2 = 10)
    FROM VALUES (1, 10), (2, 10), (NULL, 10), (1, 10), (3, 12) AS tab(col1, col2);
 [1,2]