Best practices for serverless compute

Preview

This article presents you with best practice recommendations for using serverless compute in your notebooks and jobs.

By following these recommendations, you will enhance the productivity, cost efficiency, and reliability of your workloads on Databricks.

Migrating workloads to serverless compute

To protect the isolation of user code, serverless compute utilizes Databricks secure shared access mode. Because of this, some workloads will require code changes to continue working on serverless compute. For a list of unsupported features, see Serverless compute limitations.

Certain workloads are easier to migrate than others. Workloads that meet the following requirements will be the easiest to migrate:

The data being accessed must be stored in Unity Catalog.
The workload should be compatible with shared access mode compute.
The workload should be compatible with Databricks Runtime 14.3 or above.

To test if a workload will work on serverless compute, run it on a non-serverless compute resource with Shared access mode and a Databricks Runtime of 14.3 or above. If the run is successful, the workload is ready for migration.

Because of the significance of this change and the current list of limitations, many workloads will not migrate seamlessly. Instead of recoding everything, Databricks recommends prioritizing serverless compute compatibility as you create new workloads.

Ingesting data from external systems

Because serverless compute does not support JAR file installation, you cannot use a JDBC or ODBC driver to ingest data from an external data source.

Alternative strategies you can use for ingestion include:

Auto Loader to incrementally and efficiently processes new data files as they arrive in cloud storage. See What is Auto Loader?.
Data ingestion partner solutions. See Connect to ingestion partners using Partner Connect.
The add data UI to directly upload files. See Upload files to Databricks.

Ingestion alternatives

When using serverless compute, you can also use the following features to query your data without moving it.

If you want to limit data duplication or guarantee that you are querying the freshest possible data, Databricks recommends using Delta Sharing. See What is Delta Sharing?.
If you want to do ad hoc reporting and proof-of-concept work, Databricks recommends trying the right choice, which might be Lakehouse Federation. Lakehouse Federation enables syncing entire databases to Databricks from external systems and is governed by Unity Catalog. See What is Lakehouse Federation?.

Try one or both of these features and see whether they satisfy your query performance requirements.

Monitor the cost of serverless compute

There are multiple features you can use to help you monitor the cost of serverless compute:

Use system tables to create dashboards, set up alerts, and perform ad hoc queries. See Monitor the cost of serverless compute.
Set up budget alerts in your account. See Use budgets to monitor account spending.