Configure and edit Databricks tasks

This article focuses on instructions for creating, configuring, and editing tasks using the Workflows workspace UI.

Databricks manages tasks as components of Databricks Jobs. A job has one or more tasks. You create a new job in the workspace UI by configuring the first task. To configure a new job, see Configure and edit Databricks Jobs.

Each task has an associated compute resource that runs the task logic. See Configure compute for jobs.

Databricks has other entry points and tools for task configuration, including the following:

Create or configure a task

To edit an existing task or add a new task with the workspace UI, select an existing job using the following steps:

  1. Click Workflows Icon Workflows in the sidebar.

  2. In the Name column, click the job name.

  3. Click the Tasks tab. The task graph appears.

  4. To edit a task, click the task name. The task configuration appears below the task graph.

  5. To add a task, click Add Task Button.

Types of tasks

Configuration options and instructions vary by task. The following task types are available:

Clone a task

Clone tasks to copy all the configurations of an existing task, including upstream dependencies.

To clone a task, do the following:

  1. Select the task in the task graph.

  2. Click Clone task button.

  3. Specify a Cloned task name and click Clone.

Delete a task

To delete a task, do the following:

  1. Select the task in the task graph.

  2. Click Trash and select Delete task.

Copy a task path

Certain task types, for example, notebook tasks, allow you to copy the path to the task source code:

  1. Click the Tasks tab.

  2. Select the task containing the path to copy.

  3. Click Jobs Copy Icon next to the task path to copy the path to the clipboard.

Advanced task settings

The following advanced settings control retries for failed tasks and timeout policies for unresponsive tasks.

Note

You can set notifications at the task or job level. See Add email and system notifications for job events.

Set a retry policy

The default setting for task retries depends on the job configuration. For most configurations, the default setting does not retry any tasks on task failure.

Continuous jobs use an exponential backoff retry policy. See How are failures handled for continuous jobs?.

To configure a policy that determines when and how many times failed task runs are retried, click + Add next to Retries.

The retry interval is calculated in milliseconds between the start of the failed run and the subsequent retry run.

Note

If you configure both Timeout and Retries, the timeout applies to each retry.

Configure an expected completion time or a timeout for a task

You can configure optional duration thresholds for a task, including an expected and maximum completion time. To configure duration thresholds, click Duration threshold.

Enter a duration in the Warning field to configure the task’s expected completion time. If the task exceeds this threshold, an event is triggered. You can use this event to notify when a task is running slowly. See Configure notifications for slow running or late jobs.

To configure a maximum completion time for a task, enter the maximum duration in the Timeout field. If the task does not complete in this time, Databricks sets its status to “Timed Out”.