table reference

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

A table reference is an intermediate result table within SQL. It can be derived from other operators, such as functions, joins or a subquery, reference a base table directly, or be constructed as an inline table.

Syntax

table_reference
  { table_name [ TABLESAMPLE clause ] [ table_alias ] |
    { STREAM table_name [ table_alias ] | STREAM ( table_name ) [ table_alias ] } |
    view_name [ table_alias ] |
    JOIN clause |
    PIVOT clause |
    UNPIVOT clause |
    [ STREAM ] table_valued_function [ table_alias ] |
    [ LATERAL ] table_valued_function [ table_alias ] |
    VALUES clause |
    [ LATERAL ] ( query ) [ TABLESAMPLE clause ] [ table_alias ] }

Parameters

  • table_name

    Identifies a table that may contain a temporal specification. See Work with Delta Lake table history for details.

    If the table cannot be found Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.

    See Column, field, parameter, and variable resolution for more information on name resolution.

  • view_name

    Identifies a view or a common table expression (CTE). If the view cannot be found Databricks raises a TABLE_OR_VIEW_NOT_FOUND error.

    See Column, field, parameter, and variable resolution for more information on name resolution.

  • STREAM

    Returns a table or a table valued function as a streaming source. Tables cannot be provided with a temporal specification when used with the STREAM keyword. Streaming sources are most commonly used in the definitions of streaming tables.

  • JOIN

    Combines two or more relations using a join.

  • PIVOT

    Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 12.2 LTS and above.

    Used for data perspective; you can get the aggregated values based on specific column value.

    Prior to Databricks Runtime 12.0 PIVOT is limited to SELECT following the FROM clause.

  • UNPIVOT

    Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 12.2 LTS and above.

    Used for data perspective; you can split multiple column groups into rows.

  • [LATERAL] table_valued_function_invocation

    Invokes a table valued function. To refer to columns exposed by a preceding table_reference in the same FROM clause you must specify LATERAL.

  • VALUES

    Defines an inline table.

  • [LATERAL] ( query )

    Computes a table reference using a query. A query prefixed by LATERAL may reference columns exposed by a preceding table_reference in the same FROM clause. Such a construct is called a correlated or dependent query.

  • TABLESAMPLE

    Optionally reduce the size of the result set by only sampling a fraction of the rows.

  • table_alias

    Optionally specifies a label for the table_reference. If the table_alias includes column_identifiers their number must match the number of columns in the table_reference.

Select on Delta table

In addition to the standard SELECT options, Delta tables support the time travel options described in this section. For details, see Work with Delta Lake table history.

AS OF syntax

table_identifier TIMESTAMP AS OF timestamp_expression

table_identifier VERSION AS OF version
  • timestamp_expression can be any one of:

    • '2018-10-18T22:15:12.013Z', that is, a string that can be cast to a timestamp

    • cast('2018-10-18 13:36:32 CEST' as timestamp)

    • '2018-10-18', that is, a date string

    • current_timestamp() - interval 12 hours

    • date_sub(current_date(), 1)

    • Any other expression that is or can be cast to a timestamp

  • version is a long value that can be obtained from the output of DESCRIBE HISTORY table_spec.

Neither timestamp_expression nor version can be subqueries.

Example

> SELECT * FROM events TIMESTAMP AS OF '2018-10-18T22:15:12.013Z'

> SELECT * FROM events VERSION AS OF 123

@ syntax

Use the @ syntax to specify the timestamp or version. The timestamp must be in yyyyMMddHHmmssSSS format. You can specify a version after @ by prepending a v to the version. For example, to query version 123 for the table events, specify events@v123.

Example

> SELECT * FROM events@20190101000000000

> SELECT * FROM events@v123

Examples

-- select all referencable columns from all tables
> SELECT * FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
  1   2   3   4

-- select all referencable columns from one table
> SELECT t2.* FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
  3   4

-- select all referencable columns from all tables except t2.c4
> SELECT * EXCEPT(c4) FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
  1   2   3

-- select all referencable columns from a table, except a nested field.
> SELECT * EXCEPT(c2.b) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
  1  { "a" : 2 }

-- Removing all fields results in an empty struct
> SELECT * EXCEPT(c2.b, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
  1  { }

-- Overlapping names result in an error
> SELECT * EXCEPT(c2, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
  Error: EXCEPT_OVERLAPPING_COLUMNS