Authentication for Databricks automation

In Databricks, authentication refers to verifying a Databricks identity (such as a user, service principal, or group). Databricks uses credentials (such as an access token or a username and password) to verify the identity.

After Databricks verifies the caller’s identity, Databricks then uses a process called authorization to determine whether the verified identity has sufficient access permissions to perform the specified action on the resource at the given location. This article includes details only about authentication. It does not include details about authorization or access permissions; see Access control.

When a tool makes an automation or API request, it includes credentials that authenticate an identity with Databricks. This article describes typical ways to create, store, and pass credentials and related information that Databricks needs to authenticate and authorize requests. To learn which credential types, related information, and storage mechanism are supported by your tools, scripts, and apps, see your provider’s documentation.

Databricks personal access tokens

Databricks personal access tokens are one of the most well-supported types of credentials for resources and operations at the Databricks workspace level. Many storage mechanisms for credentials and related information, such as environment variables and configuration profiles, provide support for Databricks personal access tokens. Although a Databricks workspace can have multiple personal access tokens, each personal access token works for only a single Databricks workspace.

Note

Databricks supports Google ID tokens in addition to Databricks personal access tokens. To learn whether Google ID tokens are supported by your tools, scripts, and apps, see your provider’s documentation.

You use Databricks personal access tokens or Google ID workspace-level tokens for credentials when automating Databricks workspace-level functionality. To automate Databricks account-level functionality, you cannot use Databricks personal access tokens or Google ID workspace-level tokens. Instead, you use the Google ID account-level tokens of Databricks account-level admins only. Databricks account-level admins are account-level Google service accounts acting as account-level admin users. For more information, see Authentication with Google ID tokens and the Account API 2.0.

To create a Databricks personal access token for a Databricks user, do the following:

  1. In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.

  2. On the Access tokens tab, click Generate new token.

  3. (Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).

  4. Click Generate.

  5. Copy the displayed token, and then click Done.

Important

Be sure to save the copied token in a secure location. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, Databricks recommends that you immediately delete that token from your workspace by clicking the X next to the token on the Access tokens tab.

Personal access tokens for service principals

To create a Databricks personal access token for a Databricks service principal instead of a Databricks user, see Manage access tokens for a service principal.

Managing personal access tokens

For information about enabling and disabling all Databricks personal access tokens for a workspace, controlling who can use tokens in a workspace, setting a maximum lifetime for tokens in a workspace, and other token management operations for a workspace, see Manage personal access tokens.

Environment variables

Databricks supported products, and a few third-party products that work with Databricks, support some of the following unique environment variables. To learn which of these unique environment variables are supported by your tools, scripts, and apps, see your provider’s documentation. To create, change, and delete environment variables, see your operating system’s documentation.

Environment variable

DATABRICKS_ACCOUNT_ID

The ID of a Databricks account.

Applies only to the Databricks Terraform provider.

DATABRICKS_ADDRESS

The URL of a Databricks workspace.

For operations at the Databricks account level, the URL to the Databricks account console.

Examples: https://1234567890123456.7.gcp.databricks.com, https://accounts.gcp.databricks.com

Applies to Databricks Connect only.

DATABRICKS_API_TOKEN

The value of a Databricks personal access token.

Applies to Databricks Connect only.

DATABRICKS_CLUSTER_ID

The ID of a Databricks cluster.

Applies to Databricks Connect only.

DATABRICKS_CONFIG_FILE

The full path to a Databricks configuration profiles file.

Default: ~/.databrickscfg for Unix, Linux, and macOS; %USERPROFILE%\.databrickscfg for Windows

DATABRICKS_CONFIG_PROFILE

The name of a Databricks configuration profile.

Default: DEFAULT

DATABRICKS_DEBUG_HEADERS

Whether debug HTTP headers of requests made by the provider are output.

Default: false

Applies to the Databricks Terraform provider only.

DATABRICKS_DEBUG_TRUNCATE_BYTES

Truncate the length of JSON fields in HTTP requests and responses above this limit.

Default: 96

Applies to the Databricks Terraform provider only.

DATABRICKS_DSN

The data source name (DSN) connection string to a Databricks compute resource.

Applies to the Databricks SQL Driver for Go only.

DATABRICKS_HOST

The URL to a Databricks workspace.

For operations at the Databricks account level, the URL to the Databricks account console.

Examples: https://1234567890123456.7.gcp.databricks.com, https://accounts.gcp.databricks.com

DATABRICKS_ORG_ID

The organization ID of a Databricks workspace.

Applies to Databricks Connect only.

DATABRICKS_PASSWORD

The password of a Databricks workspace user.

DATABRICKS_PORT

The port number to communicate with a Databricks cluster.

Applies to Databricks Connect only.

DATABRICKS_RATE_LIMIT

The maximum number of requests per second.

Default: 15

Applies to the Databricks Terraform provider only.

DATABRICKS_TOKEN

The value of a Databricks personal access token.

DATABRICKS_USERNAME

The username of a Databricks workspace user.

DBSQLCLI_ACCESS_TOKEN

The value of a Databricks personal access token.

Applies to the Databricks SQL CLI only.

DBSQLCLI_HOST_NAME

The value of the Server hostname field for a Databricks SQL warehouse.

Examples: 1234567890123456.7.gcp.databricks.com

Applies to the Databricks SQL CLI only.

DBSQLCLI_HTTP_PATH

The value of the HTTP path field for a Databricks SQL warehouse.

Example: /sql/1.0/warehouses/1abc2d3456e7f890a

Applies to the Databricks SQL CLI only.

PERSONAL_ACCESS_TOKEN

The value of a Databricks personal access token.

Applies to the Apache Airflow integration with Databricks only.

Configuration profiles

A Databricks configuration profile contains settings and other information that Databricks needs to authenticate. Databricks configuration profiles are stored in Databricks configuration profiles files for your tools, scripts, and apps to use. To learn whether Databricks configuration profiles are supported by your tools, scripts, and apps, see your provider’s documentation.

  1. Use your favorite text editor to create a file named .databrickscfg in your ~ (your user home) folder on Unix, Linux, or macOS, or your %USERPROFILE% (your user home) folder on Windows. Do not forget the dot (.) at the beginning of the file name. Add the following contents to this file:

    [<DEFAULT>]
    host = <your-workspace-url>
    token = <your-personal-access-token>
    
  2. In the preceding contents, replace the following values, and then save the file:

    • <DEFAULT> with a unique name for the configuration profile, such as DEFAULT, DEV, PROD, or similar.

    • <your-workspace-url> with your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com.

    • <your-personal-access-token> with your Databricks personal access token.

    For example, the .databrickscfg file might look like this:

    [DEFAULT]
    host = https://1234567890123456.7.gcp.databricks.com
    token = dapi12345678901234567890123456789012
    

    Tip

    You can create additional configuration profiles by specifying different profile names within the same .databrickscfg file, for example:

    [DEFAULT]
    host = https://1234567890123456.7.gcp.databricks.com
    token = dapi12345678901234567890123456789012
    
    [DEV]
    host = https://2345678901234567.8.gcp.databricks.com
    token = dapi23456789012345678901234567890123
    

ODBC DSNs

In ODBC, a data source name (DSN) is a symbolic name that tools, scripts, and apps use to request a connection to an ODBC data source. A DSN stores connection details such as the path to an ODBC driver, networking details, authentication credentials, and database details. To learn whether ODBC DSNs are supported by your tools, scripts, and apps, see your provider’s documentation.

To install and configure the Databricks ODBC Driver and create an ODBC DSN for Databricks, see ODBC driver.

JDBC connection URLs

In JDBC, a connection URL is a symbolic URL that tools, scripts, and apps use to request a connection to a JDBC data source. A connection URL stores connection details such as networking details, authentication credentials, database details, and JDBC driver capabilities. To learn whether JDBC connection URLs are supported by your tools, scripts, and apps, see your provider’s documentation.

To install and configure the Databricks JDBC Driver and create a JDBC connection URL for Databricks, see JDBC driver.

Google Cloud CLI

The Google Cloud CLI enables you to authenticate with Databricks on Google Cloud through your terminal for Linux or macOS, or through PowerShell or your Command Prompt for Windows. To learn whether the Google Cloud CLI is supported by your tools, scripts, and apps, see your provider’s documentation.

To use the Google Cloud CLI to authenticate with Databricks on Google Cloud, run the gcloud init command, and then follow the on-screen prompts:

gcloud init

For more detailed authentication options, see Initializing the gcloud CLI.