Substitutions and variables in Databricks Asset Bundles

Databricks Asset Bundles supports substitutions and custom variables, which make your bundle configuration files more modular and reusable. Both substitutions and custom variables enable dynamic retrieval of values so that settings can be determined at the time a bundle is deployed and run.

Tip

You can also use dynamic value references for job parameter values to pass context about a job run to job tasks. See What is a dynamic value reference? and Parameterize jobs.

Substitutions

You can use substitutions to retrieve values of settings that change based on the context of the bundle deployment and run.

For example, when you run the bundle validate --output json command, you might see a graph like this:

{
  "bundle": {
    "name": "hello-bundle",
    "target": "dev",
    "...": "..."
  },
  "workspace": {
    "...": "...",
    "current_user": {
      "...": "...",
      "userName": "someone@example.com",
      "...": "...",
    },
    "...": "..."
  },
  "...": {
    "...": "..."
  }
}

Subsitutions can be used to refer to the values of the bundle name, bundle target, and workspace userName fields to construct the workspace root_path in the bundle configuration file:

bundle:
  name: hello-bundle

workspace:
  root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/my-envs/${bundle.target}

# ...

targets:
  dev:
    default: true

You can also create substitutions for named resources. For example, for the pipeline configured with the name my_pipeline, ${resources.pipelines.my_pipeline.target} is the substitution for the value of the target of my_pipeline.

To determine valid substitutions, you can use the schema hierarchy documented in the REST API reference or the output of the bundle schema command.

Here are some commonly used substitutions:

  • ${bundle.name}

  • ${bundle.target}  # Use this substitution instead of ${bundle.environment}

  • ${workspace.host}

  • ${workspace.current_user.short_name}

  • ${workspace.current_user.userName}

  • ${workspace.file_path}

  • ${workspace.root_path}

  • ${resources.jobs.<job-name>.id}

  • ${resources.models.<model-name>.name}

  • ${resources.pipelines.<pipeline-name>.name}

Custom variables

You can define both simple and complex custom variables in your bundle to enable dynamic retrieval of values needed for many scenarios. Custom variables are declared in your bundle configuration files within the variables mapping. See variables.

The following example configuration defines the variables my_cluster_id and my_notebook_path:

variables:
  my_cluster_id:
    description: The ID of an existing cluster.
    default: 1234-567890-abcde123
  my_notebook_path:
    description: The path to an existing notebook.
    default: ./hello.py

If you do not provide a default value for a variable as part of this declaration, you must set it when executing bundle commands, through an environment variable, or elsewhere within your bundle configuration files as described in Set a variable’s value.

To reference a custom variable within your bundle configuration, use the variable substitution ${var.<variable_name>}. For example, to reference the variables my_cluster_id and my_notebook_path:

resources:
  jobs:
    hello-job:
      name: hello-job
      tasks:
        - task_key: hello-task
          existing_cluster_id: ${var.my_cluster_id}
          notebook_task:
            notebook_path: ${var.my_notebook_path}

Set a variable’s value

If you have not provided a default value for a variable, or if you want to temporarily override the default value for a variable, provide the variable’s new temporary value using one of the following approaches:

  • Provide the variable’s value as part of a bundle command such as validate, deploy, or run. To do this, use the option --var="<key>=<value>", where <key> is the variable’s name, and <value> is the variable’s value. For example, as part of the bundle validate command, to provide the value of 1234-567890-abcde123 to the variable named my_cluster_id, and to provide the value of ./hello.py to the variable named my_notebook_path, run:

    databricks bundle validate --var="my_cluster_id=1234-567890-abcde123,my_notebook_path=./hello.py"
    
    # Or:
    databricks bundle validate --var="my_cluster_id=1234-567890-abcde123" --var="my_notebook_path=./hello.py"
    
  • Provide the variable’s value by setting an environment variable. The environment variable’s name must start with BUNDLE_VAR_. To set environment variables, see your operating system’s documentation. For example, to provide the value of 1234-567890-abcde123 to the variable named my_cluster_id, and to provide the value of ./hello.py to the variable named my_notebook_path, run the following command before you call a bundle command such as validate, deploy, or run:

    For Linux and macOS:

    export BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 && export BUNDLE_VAR_my_notebook_path=./hello.py
    

    For Windows:

    "set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py"
    

    Or, provide the variable’s value as part of a bundle command such as validate, deploy, or run, for example for Linux and macOS:

    BUNDLE_VAR_my_cluster_id=1234-567890-abcde123 BUNDLE_VAR_my_notebook_path=./hello.py databricks bundle validate
    

    Or for Windows:

    "set BUNDLE_VAR_my_cluster_id=1234-567890-abcde123" && "set BUNDLE_VAR_my_notebook_path=./hello.py" && "databricks bundle validate"
    
  • Provide the variable’s value within your bundle configuration files. To do this, use a variables mapping within the targets mapping, following this format:

    variables:
      <variable-name>: <value>
    

    For example, to provide values for the variables named my_cluster_id and my_notebook_path for two separate targets:

    targets:
      dev:
        variables:
          my_cluster_id: 1234-567890-abcde123
          my_notebook_path: ./hello.py
      prod:
        variables:
          my_cluster_id: 2345-678901-bcdef234
          my_notebook_path: ./hello.py
    

Note

Whichever approach you choose to provide variable values, use the same approach during both the deployment and run stages. Otherwise, you might get unexpected results between the time of a deployment and a job or pipeline run that is based on that existing deployment.

In the preceding examples, the Databricks CLI looks for values for the variables my_cluster_id and my_notebook_path in the following order, stopping when it finds a value for each matching variable, skipping any other locations for that variable:

  1. Within any --var options specified as part of the bundle command.

  2. Within any environment variables set that begin with BUNDLE_VAR_.

  3. Within any variables mappings, among the targets mappings within your bundle configuration files.

  4. Any default value for that variable’s definition, among the top-level variables mappings within your bundle configuration files.

Define a complex variable

A custom variable is assumed to be of type string unless you define it as a complex variable. To define a custom variable with a complex type for your bundle, set type to complex in your bundle configuration.

Note

The only valid value for the type setting is complex. In addition, bundle validation fails if type is set to complex and the default defined for the variable is a single value.

In the following example, cluster settings are defined within a custom complex variable named my_cluster:

variables:
  my_cluster:
    description: "My cluster definition"
    type: complex
    default:
      spark_version: "13.2.x-scala2.11"
      node_type_id: "Standard_DS3_v2"
      num_workers: 2
      spark_conf:
        spark.speculation: true
        spark.databricks.delta.retentionDurationCheck.enabled: false

resources:
  jobs:
    my_job:
      job_clusters:
        - job_cluster_key: my_cluster_key
          new_cluster: ${var.my_cluster}
      tasks:
      - task_key: hello_task
        job_cluster_key: my_cluster_key

Retrieve an object’s ID value

For the alert, cluster_policy, cluster, dashboard, instance_pool, job, metastore, pipeline, query, service_principal, and warehouse object types, you can define a lookup for your custom variable to retrieve a named object’s ID using this format:

variables:
  <variable-name>:
    lookup:
      <object-type>: "<object-name>"

If a lookup is defined for a variable, the ID of the object with the specified name is used as the value of the variable. This ensures the correct resolved ID of the object is always used for the variable.

Note

An error occurs if an object with the specified name does not exist, or if there is more than one object with the specified name.

For example, in the following configuration, ${var.my_cluster_id} will be replaced by the ID of the 12.2 shared cluster.

variables:
  my_cluster_id:
    description: An existing cluster
    lookup:
      cluster: "12.2 shared"

resources:
  jobs:
    my_job:
      name: "My Job"
      tasks:
        - task_key: TestTask
          existing_cluster_id: ${var.my_cluster_id}