SELECT
Applies to: Databricks SQL
Databricks Runtime
Composes a result set from one or more table references.
The SELECT
clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses.
Syntax
SELECT [ hints ] [ ALL | DISTINCT ] { named_expression | star_clause } [, ...]
FROM table_reference [, ...]
[ LATERAL VIEW clause ]
[ WHERE clause ]
[ GROUP BY clause ]
[ HAVING clause]
[ QUALIFY clause ]
named_expression
expression [ column_alias ]
star_clause
[ { table_name | view_name } . ] * [ except_clause ]
except_clause
EXCEPT ( { column_name | field_name } [, ...] )
Parameters
-
Hints help the Databricks SQL optimizer make better planning decisions. Databricks SQL supports hints that influence selection of join strategies and repartitioning of the data.
ALL
Select all matching rows from the table references. Enabled by default.
DISTINCT
Select all matching rows from the table references after removing duplicates in results.
named_expression
An expression with an optional assigned name.
-
A combination of one or more values, operators, and SQL functions that evaluates to a value.
-
An optional column identifier naming the expression result. If no
column_alias
is provided Databricks SQL derives one.
-
star_clause
A shorthand to name all the referencable columns in the
FROM
clause. The list of columns is ordered by the order oftable_reference
s and the order of columns within eachtable_reference
.The _metadata column is not included this list. You must reference it explicitly.
-
If present limits the columns to be named to those in the specified referencable table.
-
If specified limits the columns to be expanded to those in the specified referencable view.
-
except_clause
Applies to:
Databricks SQL
Databricks Runtime 11.0 and above
Optionally prunes columns or fields from the referencable set of columns identified in the
select_star
clause.-
A column that is part of the set of columns that you can reference.
-
A reference to a field in a column of the set of columns that you can reference. If you exclude all fields from a
STRUCT
, the result is an emptySTRUCT
.
Each name must reference a column included in the set of columns that you can reference or their fields. Otherwise, Databricks SQL raises a UNRESOLVED_COLUMN error. If names overlap or are not unique, Databricks SQL raises an EXCEPT_OVERLAPPING_COLUMNS error.
-
-
A source of input for the
SELECT
. This input reference can be turned into a streaming reference by using theSTREAM
keyword prior to the reference. -
Used in conjunction with generator functions such as
EXPLODE
, which generates a virtual table containing one or more rows.LATERAL VIEW
applies the rows to each original output row.In Databricks SQL, and starting with Databricks Runtime 12.2 this clause is deprecated. You should invoke a table valued generator function as a table_reference.
-
Filters the result of the
FROM
clause based on the supplied predicates. -
The expressions that are used to group the rows. This is used in conjunction with aggregate functions (
MIN
,MAX
,COUNT
,SUM
,AVG
) to group rows based on the grouping expressions and aggregate values in each group. When aFILTER
clause is attached to an aggregate function, only the matching rows are passed to that function. -
The predicates by which the rows produced by
GROUP BY
are filtered. TheHAVING
clause is used to filter rows after the grouping is performed. If you specifyHAVING
withoutGROUP BY
, it indicates aGROUP BY
without grouping expressions (global aggregate). -
The predicates that are used to filter the results of window functions. To use
QUALIFY
, at least one window function is required to be present in the SELECT list or the QUALIFY clause.
Select on Delta table
In addition to the standard SELECT
options, Delta tables support the time travel options described in this section. For details, see Work with Delta Lake table history.
AS OF
syntax
table_identifier TIMESTAMP AS OF timestamp_expression
table_identifier VERSION AS OF version
timestamp_expression
can be any one of:'2018-10-18T22:15:12.013Z'
, that is, a string that can be cast to a timestampcast('2018-10-18 13:36:32 CEST' as timestamp)
'2018-10-18'
, that is, a date stringcurrent_timestamp() - interval 12 hours
date_sub(current_date(), 1)
Any other expression that is or can be cast to a timestamp
version
is a long value that can be obtained from the output ofDESCRIBE HISTORY table_spec
.
Neither timestamp_expression
nor version
can be subqueries.
Examples
-- select all referencable columns from all tables
> SELECT * FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
1 2 3 4
-- select all referencable columns from one table
> SELECT t2.* FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
3 4
-- select all referencable columns from all tables except t2.c4
> SELECT * EXCEPT(c4) FROM VALUES(1, 2) AS t1(c1, c2), VALUES(3, 4) AS t2(c3, c4);
1 2 3
-- select all referencable columns from a table, except a nested field.
> SELECT * EXCEPT(c2.b) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
1 { "a" : 2 }
-- Removing all fields results in an empty struct
> SELECT * EXCEPT(c2.b, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
1 { }
-- Overlapping names result in an error
> SELECT * EXCEPT(c2, c2.a) FROM VALUES(1, named_struct('a', 2, 'b', 3)) AS t(c1, c2);
Error: EXCEPT_OVERLAPPING_COLUMNS