Serverless compute release notes

Preview

This feature is in Private Preview.

This article explains the features and behaviors that are currently available and upcoming on serverless compute for notebooks and jobs.

Databricks periodically releases updates to serverless compute, automatically upgrading the serverless compute runtime to support enhancements and upgrades to the platform. All users get the same updates, rolled out over a short period of time.

Upcoming behavioral changes

This section highlights behavioral changes coming in the next serverless compute version. When the changes are pushed to production they will be added to the release notes.

September 2024

Schema binding change for views

When the data types in a view’s underlying query change from those used when the view was first created, Databricks will no longer throw errors for references to the view when no safe cast can be performed. Instead the view will compensate using regular casting rules where possible.

This change allows Databricks to tolerate table schema changes more readily.

Disallow undocumented ! syntax toleration for NOT outside boolean logic

Databricks will no longer tolerate the use of ! as a synonym for NOT outside of boolean logic. For example:

CREATE ... IF ! EXISTS, IS ! NULL, ! NULL column or field property, ! IN and ! BETWEEN must be replaced with:

CREATE ... IF NOT EXISTS, IS NOT NULL, NOT NULL column or field property, NOT IN and NOT BETWEEN.

This change reduces confusion, aligns with the SQL standard, and makes SQL more portable.

The boolean prefix operator ! (e.g. !is_mgr or !(true AND false)) is unaffected by this change.

Disallow undocumented and unprocessed portions of column definition syntax in views

Databricks supports CREATE VIEW with named columns and column comments. The specification of column types, NOT NULL constraints, or DEFAULT has been tolerated in the syntax without having any effect. Databricks will remove this syntax toleration.

Doing so reduces confusion, aligns with the SQL standard, and allows for future enhancements.

Release notes

This section includes release notes for serverless compute. Release notes are organized by year and week of year. Serverless compute always runs using the most recently released version listed here.

The JDK is upgraded from JDK 8 to JDK 17

August 15, 2024

Serverless compute for notebooks and workflows has migrated from Java Development Kit (JDK) 8 to JDK 17 on the server side. This upgrade includes the following behavioral changes:

Bug fixes

Correct parsing of regex patterns with negation in nested character grouping: With this upgrade, Databricks now supports the correct parsing of regex patterns with negation in nested character grouping. For example, [^[abc]] will be parsed as “any character that is NOT one of ‘abc’”.

Additionally, Photon behavior was inconsistent with Spark for nested character classes. Regex patterns containing nested character classes will no longer use Photon, and instead will use Spark. A nested character class is any pattern containing square brackets within square brackets, such as [[a-c][1-3]].

Version 2024.30

July 23, 2024

This serverless compute release roughly corresponds to Databricks Runtime 15.1

New features

Support for star (*) syntax in the WHERE clause: You can now use the star (*) syntax in the WHERE clause to reference all columns from the SELECT list.

For example, SELECT * FROM VALUES(1, 2) AS T(a1, a2) WHERE 1 IN(T.*).

Changes

Improved error recovery for JSON parsing: The JSON parser used for from_json() and JSON path expressions now recovers faster from malformed syntax, resulting in less data loss.

When encountering malformed JSON syntax in a struct field, an array value, a map key, or a map value, the JSON parser will now return NULL only for the unreadable field, key, or element. Subsequent fields, keys, or elements will be properly parsed. Prior to this change, the JSON parser abandoned parsing the array, struct, or map and returned NULL for the remaining content.

Version 2024.15

April 15, 2024

This is the initial serverless compute version. This version roughly corresponds to Databricks Runtime 14.3 with some modifications that remove support for some non-serverless and legacy features.

Supported Spark configuration parameters

To automate the configuration of Spark on serverless compute, Databricks has removed support for manually setting most Spark configurations. You can manually set only the following Spark configuration parameters:

  • spark.sql.legacy.timeParserPolicy (Default value is EXCEPTION)

  • spark.sql.session.timeZone (Default value is Etc/UTC)

  • spark.sql.shuffle.partitions (Default value is auto)

  • spark.sql.ansi.enabled (Default value is true)

Job runs on serverless compute will fail if you set a Spark configuration that is not in this list.

For more on configuring Spark properties, see Set Spark configuration properties on Databricks.

Caching API and SQL commands not supported

Usage of Dataframe and SQL cache APIs is not supported. Using any of these APIs or SQL commands will result in an exception.

Unsupported APIs:

Unsupported SQL commands:

Global temporary views not supported

The creation of global temporary views is not supported. Using either of these commands will result in an exception:

Instead, Databricks recommends using session temporary views or creating tables where cross-session data passing is required.

CREATE FUNCTION (External) not supported

The CREATE FUNCTION (External) command is not supported. Using this command results in an exception.

Instead, Databricks recommends using CREATE FUNCTION (SQL and Python) to create UDFs.

Hive SerDe tables not supported

Hive SerDe tables are not supported. Additionally, the corresponding LOAD DATA command which loads data into a Hive SerDe table is not supported. Using the command will result in an exception.

Support for data sources is limited to AVRO, BINARYFILE, CSV, DELTA, JSON, KAFKA, ORC, PARQUET, ORC, TEXT, and XML.

Hive variables not supported

Hive variables (for example ${env:var}, ${configName}, ${system:var}, and spark.sql.variable) or config variable references using the ${var} syntax are not supported. Using Hive variables will result in an exception.

Instead, use DECLARE VARIABLE, SET VARIABLE, and SQL session variable references and parameter markers (‘?’, or ‘:var’) to declare, modify, and reference session state. You can also use the IDENTIFIER clause to parameterize object names in many cases.

input_file functions are deprecated

The input_file_name(), input_file_block_length(), and input_file_block_start() functions have been deprecated. Using these functions is highly discouraged.

Instead, use the file metadata column to retrieve file metadata information.

Behavioral changes

Serverless compute version 2024.15 includes the following behavioral changes:

  • unhex(hexStr) bug fix: When using the unhex(hexStr) function, hexStr is always padded left to a whole byte. Previously the unhex function ignored the first half-byte. For example: unhex('ABC') now produces x'0ABC' instead of x'BC'.

  • Auto-generated column aliases are now stable: When the result of an expression is referenced without a user-specified column alias, this auto-generated alias will now be stable. The new algorithm may result in a change to the previously auto-generated names used in features like materialized views.

  • Table scans with CHAR type fields are now always padded: Delta tables, certain JDBC tables, and external data sources store CHAR data in non-padded form. When reading, Databricks will now pad the data with spaces to the declared length to ensure correct semantics.

  • Casts from BIGINT/DECIMAL to TIMESTAMP throw an exception for overflowed values: Databricks allows casting from BIGINT and DECIMAL to TIMESTAMP by treating the value as the number of seconds from the Unix epoch. Previously, Databricks would return overflowed values but now throws an exception in cases of overflow. Use try_cast to return NULL instead of an exception.

  • PySpark UDF execution has been improved to match the exact behavior of UDF execution on single user compute: The following changes have been made:

    • UDFs with a string return type no longer implicitly convert non-string values into strings. Previously, UDFs with a return type of str would apply a str(..) wrapper to the result regardless of the actual data type of the returned value.

    • UDFs with timestamp return types no longer implicitly apply a timezone conversion to timestamps.

System environment

Serverless compute includes the following system environment:

  • Operating System: Ubuntu 22.04.3 LTS

  • Python: 3.10.12

  • Delta Lake: 3.1.0

Installed Python libraries

The following Python libraries are installed on serverless compute by default. Additional dependencies can be installed using the Environment side panel. See Install notebook dependencies.

Library

Version

Library

Version

Library

Version

anyio

3.5.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

asttokens

2.0.5

astunparse

1.6.3

attrs

22.1.0

backcall

0.2.0

beautifulsoup4

4.11.1

black

22.6.0

bleach

4.1.0

blinker

1.4

boto3

1.24.28

botocore

1.27.96

cachetools

5.3.2

certifi

2022.12.7

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

comm

0.1.2

contourpy

1.0.5

cryptography

39.0.1

cycler

0.11.0

Cython

0.29.32

databricks-connect

14.3.1

databricks-sdk

0.20.0

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

defusedxml

0.7.1

distlib

0.3.8

docstring-to-markdown

0.11

entrypoints

0.4

executing

0.8.3

facets-overview

1.1.1

fastjsonschema

2.19.1

filelock

3.13.1

fonttools

4.25.0

google-auth

2.28.1

googleapis-common-protos

1.62.0

grpcio

1.62.0

grpcio-status

1.62.0

httplib2

0.20.2

idna

3.4

importlib-metadata

4.6.4

ipyflow-core

0.0.198

ipykernel

6.25.0

ipython

8.14.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

jsonschema

4.17.3

jupyter-client

7.3.4

jupyter-server

1.23.4

jupyter_core

5.2.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keyring

23.5.0

kiwisolver

1.4.4

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lxml

4.9.1

MarkupSafe

2.1.1

matplotlib

3.7.0

matplotlib-inline

0.1.6

mccabe

0.7.0

mistune

0.8.4

more-itertools

8.10.0

mypy-extensions

0.4.3

nbclassic

0.5.2

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

nodeenv

1.8.0

notebook

6.5.2

notebook_shim

0.2.2

numpy

1.23.5

oauthlib

3.2.0

packaging

23.2

pandas

1.5.3

pandocfilters

1.5.0

parso

0.8.3

pathspec

0.10.3

patsy

0.5.3

pexpect

4.8.0

pickleshare

0.7.5

Pillow

9.4.0

pip

22.3.1

platformdirs

2.5.2

plotly

5.9.0

pluggy

1.0.0

prometheus-client

0.14.1

prompt-toolkit

3.0.36

protobuf

4.25.3

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

py4j

0.10.9.7

pyarrow

8.0.0

pyarrow-hotfix

0.5

pyasn1

0.5.1

pyasn1-modules

0.3.0

pyccolo

0.0.52

pycparser

2.21

pydantic

1.10.6

pyflakes

3.1.0

Pygments

2.11.2

PyGObject

3.42.1

PyJWT

2.3.0

pyodbc

4.0.32

pyparsing

3.0.9

pyright

1.1.294

pyrsistent

0.18.0

python-dateutil

2.8.2

python-lsp-jsonrpc

1.1.1

python-lsp-server

1.8.0

pytoolconfig

1.2.5

pytz

2022.7

pyzmq

23.2.0

requests

2.28.1

rope

1.7.0

rsa

4.9

s3transfer

0.6.2

scikit-learn

1.1.1

scipy

1.10.0

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

setuptools

65.6.3

six

1.16.0

sniffio

1.2.0

soupsieve

2.3.2.post1

ssh-import-id

5.11

stack-data

0.2.0

statsmodels

0.13.5

tenacity

8.1.0

terminado

0.17.1

threadpoolctl

2.2.0

tinycss2

1.2.1

tokenize-rt

4.2.1

tomli

2.0.1

tornado

6.1

traitlets

5.7.1

typing_extensions

4.4.0

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.14

virtualenv

20.16.7

wadllib

1.3.6

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

whatthepatch

1.0.2

wheel

0.38.4

widgetsnbextension

3.6.1

yapf

0.33.0

Zipp

1.0.0