read_statestore table-valued function

Applies to: check marked yes Databricks Runtime 14.3 and above

Preview

This feature is in Public Preview.

A table valued function for reading records from the state store of streaming queries. The returned relation only supports running as a batch query.

Syntax

read_statestore ( path [, option_key => option_value ] [ ... ] )

Arguments

read_statestore requires named parameter invocation for its optional parameters.

  • path: A STRING literal. The path of the streaming query checkpoint location.

  • option_key: The name of the option to configure. You need to use backticks (`) for options that contain dots (.).

  • option_value: A constant expression to set the option to. Accepts literals and scalar functions.

All option_values are case-insensitive.

option_key

Type

Default

Description

batchId

BIGINT

latest batch id

Represents the target batch to read from. This option is used when users want to perform time-travel. The batch should be committed but not yet cleaned up.

OperatorId

BIGINT

0

Represents the target operator to read from. This option is used when the query is using multiple stateful operators.

storeName

STRING

'DEFAULT'

Represents the target side to read from. This option is used when users want to read the state from stream-stream join.

joinSide

STRING

'None'

Represents the target side to read from. This option is used when users want to read the state from stream-stream join. One of: 'Left', 'Right', 'None'.

Returns

The function returns a result set with the following columns.

Note

The nested columns for key and value heavily depend on the input schema of the stateful operator and the type of operator.

Name

Data type

Nullable

Standard

Description

id

STRUCT

No

Key row of the stateful operator stored in the state checkpoint.

value

STRUCT

Yes

Value row of the stateful operator stored in the state checkpoint.

partition_id

INTEGER

No

The partition that contains the record.

shardId

STRING

No

A unique identifier for the shard where the data was read from.

sequenceNumber

BIGINT

No

The unique identifier of the record within its shard.

approximateArrivalTimestamp

TIMESTAMP

No

The approximate time that the record was inserted into the stream.

Examples

- Read from state
> SELECT * FROM read_statestore('/checkpoint/path');

 Read from state with storeName option
> SELECT * FROM read_statestore(
      '/checkpoint/path',
      operatorId => 0,
      batchId => 2,
      storeName => 'default'
  );

 Read from state with joinSide option

> SELECT * FROM read_statestore(
      '/checkpoint/path',
      joinSide => 'left'
  );