Set operators

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Combines two subqueries into a single one. Databricks SQL supports three types of set operators:

  • EXCEPT

  • INTERSECT

  • UNION

Syntax

subquery1 { { UNION [ ALL | DISTINCT ] |
              INTERSECT [ ALL | DISTINCT ] |
              EXCEPT [ ALL | DISTINCT ] } subquery2 } [...] }
  • subquery1, subquery2

    Any two subquery clauses as specified in SELECT. Both subqueries must have the same number of columns and share a least common type for each respective column.

  • UNION [ALL | DISTINCT]

    Returns the result of subquery1 plus the rows of subquery2`.

    If ALL is specified duplicate rows are preserved.

    If DISTINCT is specified the result does not contain any duplicate rows. This is the default.

  • INTERSECT [ALL | DISTINCT]

    Returns the set of rows which are in both subqueries.

    If ALL is specified a row that appears multiple times in the subquery1 as well as in subquery will be returned multiple times.

    If DISTINCT is specified the result does not contain duplicate rows. This is the default.

  • EXCEPT [ALL | DISTINCT ]

    Returns the rows in subquery1 which are not in subquery2.

    If ALL is specified, each row in subquery2 will remove exactly one of possibly multiple matches from subquery1.

    If DISTINCT is specified, duplicate rows are removed from subquery1 before applying the operation, so all matches are removed and the result will have no duplicate rows (matched or unmatched). This is the default.

    You can specify MINUS as a syntax alternative for EXCEPT.

When chaining set operations INTERSECT has a higher precedence than UNION and EXCEPT.

The type of each result column is the least common type of the respective columns in subquery1 and subquery2.

If the result column is a STRING its collation is derived by applying the collation precedence rules to the columns in the subqueries.

Examples

-- Use number1 and number2 tables to demonstrate set operators in this page.
> CREATE TEMPORARY VIEW number1(c) AS VALUES (3), (1), (2), (2), (3), (4);

> CREATE TEMPORARY VIEW number2(c) AS VALUES (5), (1), (1), (2);

> SELECT c FROM number1 EXCEPT SELECT c FROM number2;
  3
  4

> SELECT c FROM number1 MINUS SELECT c FROM number2;
  3
  4

> SELECT c FROM number1 EXCEPT ALL (SELECT c FROM number2);
  3
  3
  4

> SELECT c FROM number1 MINUS ALL (SELECT c FROM number2);
  3
  3
  4

> (SELECT c FROM number1) INTERSECT (SELECT c FROM number2);
  1
  2

> (SELECT c FROM number1) INTERSECT DISTINCT (SELECT c FROM number2);
  1
  2

> (SELECT c FROM number1) INTERSECT ALL (SELECT c FROM number2);
  1
  2
  2

> (SELECT c FROM number1) UNION (SELECT c FROM number2);
  1
  3
  5
  4
  2

> (SELECT c FROM number1) UNION DISTINCT (SELECT c FROM number2);
  1
  3
  5
  4
  2

> SELECT c FROM number1 UNION ALL (SELECT c FROM number2);
  3
  1
  2
  2
  3
  4
  5
  1
  1
  2