SQL Pipeline Syntax

Applies to: check marked yes Databricks Runtime 16.2 and later

Databricks supports SQL pipeline syntax which allows composing queries from combinations of chained operators.

  • Any query can have zero or more pipe operators as a suffix, delineated by the pipe character |>.

  • Each piped operation starts with one or more SQL keywords followed by its own grammar.

  • Operators can apply in any order, any number of times.

  • Typically FROM relation_name is used to start a pipeline, but any query can start a pipeline.

Syntax

{ FROM | TABLE } relation_name { |> piped_operation } [ ...]

Parameters

Example

This is query 13 from the TPC-H benchmark written in ANSI SQL:

> SELECT c_count, COUNT(*) AS custdist
    FROM
    (SELECT c_custkey, COUNT(o_orderkey) c_count
      FROM customer
      LEFT OUTER JOIN orders ON c_custkey = o_custkey
                             AND o_comment NOT LIKE '%unusual%packages%'
     GROUP BY c_custkey
  ) AS c_orders
  GROUP BY c_count
  ORDER BY custdist DESC, c_count DESC;

To write the same logic using SQL pipe operators, you can express it like this:

> FROM customer
  |> LEFT OUTER JOIN orders ON c_custkey = o_custkey
                            AND o_comment NOT LIKE '%unusual%packages%'
  |> AGGREGATE COUNT(o_orderkey) c_count
     GROUP BY c_custkey
  |> AGGREGATE COUNT(*) AS custdist
     GROUP BY c_count
  |> ORDER BY custdist DESC, c_count DESC;