Best practices for serverless compute

Preview

This feature is in Private Preview. For information on eligibility and enablement, see Enable serverless compute.

This article presents you with best practice recommendations for using serverless compute in your notebooks and jobs.

By following these recommendations, you will enhance the productivity, cost efficiency, and reliability of your workloads on Databricks.

Migrating workloads to serverless compute

To protect the isolation of user code, serverless compute utilizes Databricks secure shared access mode. Because of this, some workloads will require code changes to continue working on serverless compute. For a list of unsupported features, see Serverless compute limitations.

Certain workloads are easier to migrate than others. Workloads that meet the following requirements will be the easiest to migrate:

  • The data being accessed must be stored in Unity Catalog.

  • The workload should be compatible with shared access mode compute.

  • The workload should be compatible with Databricks Runtime 14.3 or above.

To test if a workload will work on serverless compute, run it on a non-serverless compute resource with Shared access mode and a Databricks Runtime of 14.3 or above. If the run is successful, the workload is ready for migration.

Because of the significance of this change and the current list of limitations, many workloads will not migrate seamlessly. Instead of recoding everything, Databricks recommends prioritizing serverless compute compatibility as you create new workloads.

Ingesting data from external systems

Because serverless compute does not support JAR file installation, you cannot use a JDBC or ODBC driver to ingest data from an external data source.

Alternative strategies you can use for ingestion include:

Ingestion alternatives

When using serverless compute, you can also use the following features to query your data without moving it.

  • If you want to limit data duplication or guarantee that you are querying the freshest possible data, Databricks recommends using Delta Sharing. See What is Delta Sharing?.

  • If you want to do ad hoc reporting and proof-of-concept work, Databricks recommends trying the right choice, which might be Lakehouse Federation. Lakehouse Federation enables syncing entire databases to Databricks from external systems and is governed by Unity Catalog. See What is Lakehouse Federation?.

Try one or both of these features and see whether they satisfy your query performance requirements.

Monitor the cost of serverless compute

There are multiple features you can use to help you monitor the cost of serverless compute: