TABLESAMPLE clause
Applies to: Databricks SQL Databricks Runtime
The TABLESAMPLE
statement is used to sample the relation.
Syntax
TABLESAMPLE ( { percentage PERCENT ) |
num_rows ROWS |
BUCKET fraction OUT OF total } )
[ REPEATABLE ( seed ) ]
Parameters
percentage PERCENT
An INTEGER or DECIMAL constant
percentage
between 0 and 100 specifying which percentage of the table’s rows to sample.num_rows ROWS
A constant positive INTEGER expression
num_rows
specifying an absolute number of rows out of all rows to sample.BUCKET fraction OUT OF total
An INTEGER constant
fraction
specifying the portion out of the INTEGER constanttotal
to sample.REPEATABLE ( seed )
Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
An optional positive INTEGER constant
seed
, used to always produce the same set of rows. Use this clause when you want to reissue the query multiple times, and you expect the same set of sampled rows.
Note
TABLESAMPLE
returns the approximate number of rows or fraction requested.Always use
TABLESAMPLE (percent PERCENT)
if randomness is important.TABLESAMPLE (num_rows ROWS)
is not a simple random sample but instead is implemented usingLIMIT
.
Examples
> CREATE TEMPORARY VIEW test(id, name) AS
VALUES ( 1, 'Lisa'),
( 2, 'Mary'),
( 3, 'Evan'),
( 4, 'Fred'),
( 5, 'Alex'),
( 6, 'Mark'),
( 7, 'Lily'),
( 8, 'Lucy'),
( 9, 'Eric'),
(10, 'Adam');
> SELECT * FROM test;
5 Alex
8 Lucy
2 Mary
4 Fred
1 Lisa
9 Eric
10 Adam
6 Mark
7 Lily
3 Evan
> SELECT * FROM test TABLESAMPLE (30 PERCENT) REPEATABLE (123);
1 Lisa
2 Mary
3 Evan
5 Alex
8 Lucy
> SELECT * FROM test TABLESAMPLE (5 ROWS);
5 Alex
8 Lucy
2 Mary
4 Fred
1 Lisa
> SELECT * FROM test TABLESAMPLE (BUCKET 4 OUT OF 10);
8 Lucy
2 Mary
9 Eric
6 Mark