Control the flow of tasks within a Databricks job
Some jobs are simply a list of tasks that need to be completed. You can control the execution order of tasks by specifying dependencies between them. You can configure tasks to run in sequence or parallel.
However, you can also create branching flows that include conditional tasks, error correction, or cleanup. Databricks provides functionality to control the flow of tasks within a job. The following topics describe ways that you can control the flow of your tasks.
Retries
Retries specify how many times a particular task should be re-run if the task fails with an error message. Errors are often transient and resolved through restart. Some features on Databricks, such as schema evolution with Structured Streaming, assume that you run jobs with retries to reset the environment and allow a workflow to proceed.
If you specify retries for a task, the task restarts up to the specified number of times if it encounters an error. Not all job configurations support task retries. See Set a retry policy.
When running in continuous trigger mode, Databricks automatically retries with exponential backoff. See How are failures handled for continuous jobs?.
Run if conditional tasks
You can use the Run if task type to specify conditionals for later tasks based on the outcome of other tasks. You add tasks to your job and specify upstream-dependent tasks. Based on the status of those tasks, you can configure one or more downstream tasks to run. Jobs support the following dependencies:
All succeeded
At least one succeeded
None failed
All done
At least one failed
All failed
If/else conditional tasks
You can use the If/else task type to specify conditionals based on some value. See Add branching logic to a job with the If/else task.
Jobs support taskValues
that you define in your logic and allow you to return the results of some computation or state from a task to the jobs environment. You can define If/else conditions against taskValues
, job parameters, or dynamic values.
Databricks supports the following operands for conditionals:
==
!=
>
>=
<
<=
See also:
For each tasks
Use the For each
task to run another task in a loop, passing a different set of parameters to each iteration of the task.
To add a For each
task to a job, you must define a For each
task and a nested task. The nested task is the task to run for each iteration of the For each
task and is one of the standard Databricks task types. Multiple methods are supported for passing parameters to the nested task.