Run a parameterized Databricks job task in a loop

This article discusses using the For each task with your Databricks jobs, including details on adding and configuring the task in the Jobs UI. Use the For each task to run a task in a loop, passing a different set of parameters to each iteration of the task.

Adding the For each task to a job requires defining two tasks: The For each task and a nested task. The nested task is the task to run for each iteration of the For each task and is one of the standard Databricks Jobs task types. You cannot add another For each task as the nested task.

For example, you could use the For each task to perform a common set of transformations on multiple tables, passing a table name from a list of table names to each iteration of the task.

What parameter types can I use with the For each task?

To pass parameters from a For each task, you can:

To learn how to use these different parameter types when you add or edit a For each task, see the next section Add the For each task to a job.

Add the For each task to a job

You can add a For each task when you create a job or edit a task in an existing job. To configure a For each task:

  1. In the Type drop-down menu, select For each.

  2. Enter a name for the task in the Task name field.

  3. In the Inputs text box, define the values for the For each task to iterate on. This can be one of the following:

    • A JSON formatted array of values. This can be an array of the following data types:

      • key-value pairs

      • Strings, numbers, or Boolean types

      • Arbitrarily complex JSON objects

    • Task value references. To reference task values passed from a preceding task, use the {{tasks.<task_name>.values.<task_value_name>}} syntax to set the value in the Inputs text box. For example, if a task named generate_countries_list that precedes the For each task sets the following task value:

      dbutils.jobs.taskValues.set(key = "countries", value = countries_array)

      Then the For each task references the task value in the Inputs text box using the following syntax:

      {{tasks.generate_countries_list.values.countries}}.

    • Job parameters. To reference a job parameter, use the following syntax in the Inputs text box: {{job.parameters.<name>}}. For example, {{job.parameters.countries}}.

  4. To optionally set the number of iterations that can run in parallel, enter a Concurrency value for the task. The default value is 1.

  5. To optionally receive notifications for task start, success, or failure, click + Add. See Add email and system notifications for job events.

  6. To complete the configuration of the For each task and add a nested task to run for each iteration, click Add a task to loop over.

  7. Select a task type and configuration options for the nested task. Nested tasks are standard task types and have the same configuration options. See Configure and edit Databricks tasks.

  8. To reference parameters passed from the For each task, click Parameters. Use the {{input}} reference to set the value to the array value of each iteration or {{input.<key>}} to reference individual object fields when you iterate over a list of objects.

    Add a nested task to a For each task
  9. Click Create task.

Switch between the For each task and the nested task

The For each task appears in the Jobs UI as a node with the nested task node inside the For each node. To switch between the For each task and the nested task, click the respective nodes.

Jobs UI DAG view switch to For each task
Jobs UI DAG view switch to nested task

Reference a For each task in downstream tasks

The For each task is the top-level task, and downstream tasks can specify it as a dependency. Downstream tasks cannot depend on or reference the nested task.

Run and monitor a job with a For each task

Running a job with a For each task is identical to running any other job.

Viewing and managing job runs is also identical to any other job, except the task run history for a For each task, which is presented as a table of task iterations. See View task run history for a For each task.