Run your Databricks job with serverless compute for workflows
Preview
Serverless compute for workflows is in Private Preview. For information on eligibility and enablement, see Enable serverless compute.
Important
Because serverless compute for workflows does not support controlling egress traffic, your jobs have full access to the internet.
Serverless compute for workflows allows you to run your Databricks job without configuring and deploying infrastructure. With serverless compute, you focus on implementing your data processing and analysis pipelines, and Databricks efficiently manages compute resources, including optimizing and scaling compute for your workloads. Autoscaling and Photon are automatically enabled for the compute resources that run your job.
Serverless compute for workflows auto-optimization automatically optimizes compute by selecting appropriate resources such as instance types, memory, and processing engines based on your workload. Auto-optimization also automatically retries failed tasks.
Databricks automatically upgrades the Databricks Runtime version to support enhancements and upgrades to the platform while ensuring the stability of your Databricks jobs. To see the current Databricks Runtime version used by serverless compute for workflows, see Serverless compute release notes.
Because cluster creation permission is not required, all workspace users can use serverless compute to run their workflows.
This article describes using the Databricks Jobs UI to create and run jobs that use serverless compute. You can also automate creating and running jobs that use serverless compute with the Jobs API, Databricks Asset Bundles, and the Databricks SDK for Python.
To learn about using the Jobs API to create and run jobs that use serverless compute, see Jobs in the REST API reference.
To learn about using Databricks Asset Bundles to create and run jobs that use serverless compute, see Develop a job on Databricks using Databricks Asset Bundles.
To learn about using the Databricks SDK for Python to create and run jobs that use serverless compute, see Databricks SDK for Python.
Requirements
Your Databricks workspace must have Unity Catalog enabled.
Because serverless compute for workflows uses shared access mode, your workloads must support this access mode.
Your Databricks workspace must be in a supported region. See Which regions support serverless compute?.
Create a job using serverless compute
Note
Because serverless compute for workflows ensures that sufficient resources are provisioned to run your workloads, you might experience increased startup times when running a Databricks job that requires large amounts of memory or includes many tasks.
Serverless compute is supported with the notebook, Python script, dbt, and Python wheel task types. By default, serverless compute is selected as the compute type when you create a new job and add one of these supported task types.
Databricks recommends using serverless compute for all job tasks. You can also specify different compute types for tasks in a job, which might be required if a task type is not supported by serverless compute for workflows.
Configure an existing job to use serverless compute
You can switch an existing job to use serverless compute for supported task types when you edit the job. To switch to serverless compute, either:
In the Job details side panel click Swap under Compute, click New, enter or update any settings, and click Update.
Click in the Compute drop-down menu and select Serverless.
Schedule a notebook using serverless compute
In addition to using the Jobs UI to create and schedule a job using serverless compute, you can create and run a job that uses serverless compute directly from a Databricks notebook. See Create and manage scheduled notebook jobs.
Set Spark configuration parameters
To automate the configuration of Spark on serverless compute, Databricks allows setting only specific Spark configuration parameters. For the list of allowable parameters, see Supported Spark configuration parameters.
You can set Spark configuration parameters at the session level only. To do this, set them in a notebook and add the notebook to a task included in the same job that uses the parameters. See Get and set Apache Spark configuration properties in a notebook.
Configure environments and dependencies
To learn how to install libaries and dependencies using serverless compute, see Install notebook dependencies.
Configure serverless compute auto-optimization to disallow retries
Serverless compute for workflows auto-optimization automatically optimizes the compute used to run your jobs and retries failed tasks. Auto-optimization is enabled by default, and Databricks recommends leaving it enabled to ensure critical workloads run successfully at least once. However, if you have workloads that must be executed at most once, for example, jobs that are not idempotent, you can turn off auto-optimization when adding or editing a task:
Next to Retries, click Add (or if a retry policy already exists).
In the Retry Policy dialog, uncheck Enable serverless auto-optimization (may include additional retries).
Click Confirm.
If you’re adding a task, click Create task. If you’re editing a task, click Save task.
Monitor the cost of jobs that use serverless compute for workflows
You can monitor the cost of jobs that use serverless compute for workflows by querying the billable usage system table. This table is updated to include user and workload attributes about serverless costs. See Billable usage system table reference.
View details for your Spark queries
Serverless compute for workflows has a new interface for viewing detailed runtime information for your Spark statements, such as metrics and query plans. To view query insights for Spark statements included in your jobs run on serverless compute:
Click Workflows in the sidebar.
In the Name column, click the job name you want to view insights for.
Click the specific run you want to view insights for.
In the Compute section of the Task run side panel, click Query history.
You are redirected to the Query History, prefiltered based on the task run ID of the task you were in.
For information on using query history, see Query history.
Limitations
For a list of serverless compute for workflows limitations, see Serverless compute limitations in the serverless compute release notes.