Authentication for Databricks automation - overview

In Databricks, authentication refers to verifying a Databricks identity (such as a user, service principal, or group), or a Google Cloud service account. Databricks uses credentials (such as an access token) to verify the identity.

After Databricks verifies the caller’s identity, Databricks then uses a process called authorization to determine whether the verified identity has sufficient access permissions to perform the specified action on the resource at the given location. This article includes details only about authentication. It does not include details about authorization or access permissions; see Authentication and access control.

When a tool makes an automation or API request, it includes credentials that authenticate an identity with Databricks. This article describes typical ways to create, store, and pass credentials and related information that Databricks needs to authenticate and authorize requests. To learn which credential types, related information, and storage mechanism are supported by your tools, SDKs, scripts, and apps, see Supported authentication types by Databricks tool or SDK or your provider’s documentation.

Common tasks for Databricks authentication

Use the following instructions to complete common tasks for Databricks authentication.

To complete this task…

Follow the instructions in this article

Create a Databricks user that you can use for authenticating at the Databricks account level.

Manage users in your account

Create a Databricks user that you can use for authenticating with a specific Databricks workspace.

Manage users in your workspace

Create a Databricks personal access token for a Databricks user. This Databricks personal access token can be used only for authenticating with its associated Databricks workspace.

Databricks personal access tokens for workspace users

Create a Google Cloud service account.

See Create service accounts in the Google Cloud documentation.

Add a Google Cloud service account to a Databricks account or to a specific workspace, add that service account as a Databricks user, and use that service account’s email address as the user’s username, for authenticating at the Databricks account or workpace level. (Google Cloud service accounts are separate from Databricks service principals.)

Manage users in your account, Manage users in your workspace

Create a Databricks configuration profile.

Databricks configuration profiles

Create a Databricks group, and add Databricks users to that group, for more robust authorization.

Manage account groups using the account console, Manage account groups using the workspace admin settings page

Supported Databricks authentication types

Databricks provides several ways to authenticate Databricks users, as follows:

Authentication type

Details

OAuth machine-to-machine (M2M) authentication

  • OAuth M2M authentication uses Databricks service principals for authentication.

  • OAuth M2M authentication uses short-lived (one hour) Databricks OAuth access tokens for authentication credentials.

  • Expired OAuth access tokens can be automatically refreshed by participating Databricks tools and SDKs. See Supported authentication types by Databricks tool or SDK and Databricks client unified authentication.

  • Databricks recommends that you use OAuth M2M authentication for unattended authentication scenarios. These scenarios include fully automated and CI/CD workflows, where you cannot use your web browser to authenticate with Databricks in real time.

  • Databricks recommends that you use Google Cloud credentials authentication or Google Cloud ID authentication instead of OAuth M2M authentication in cases where you must use Google Cloud service accounts and Google Cloud OAuth access tokens for authentication credentials. For example, you might need to authenticate with Databricks and other Google Cloud resources at the same time, which requires Google Cloud OAuth access tokens.

  • For additional technical details, see OAuth machine-to-machine (M2M) authentication.

OAuth user-to-machine (U2M) authentication

  • OAuth U2M authentication uses Databricks users for authentication.

  • OAuth U2M authentication uses short-lived (one hour) Databricks OAuth access tokens for authentication credentials.

  • Participating Databricks tools and SDKs can automatically refresh expired OAuth access tokens. See Supported authentication types by Databricks tool or SDK and Databricks client unified authentication.

  • OAuth U2M authentication is suitable for attended authentication scenarios. These scenarios include manual and rapid development workflows, where you use your web browser to authenticate with Databricks in real time, when prompted.

  • For additional technical details, see OAuth user-to-machine (U2M) authentication.

Google Cloud credentials authentication

  • Google Cloud credentials authentication uses Google Cloud service accounts, acting as Databricks users, for authentication.

  • Google Cloud credentials authentication uses short-lived (one hour) Google Cloud OAuth access tokens for authentication credentials. These tokens are managed internally within Google Cloud systems. You cannot access these tokens.

  • Databricks recommends that you use OAuth M2M authentication, if your target Databricks tool or SDK supports it, instead of Google Cloud credentials authentication. This is because OAuth M2M authentication can be easier to set up than Google Cloud credentials authentication.

  • For additional technical details, see Google Cloud credentials authentication.

Google Cloud ID authentication

  • Google Cloud ID authentication uses the Google Cloud CLI to authenticate Google Cloud service accounts, acting as Databricks users.

  • Google Cloud ID authentication uses short-lived (one hour) Google Cloud OAuth access tokens for authentication credentials. These tokens are managed internally within Google Cloud systems. You cannot access these tokens.

  • Databricks recommends that you use OAuth M2M authentication, if your target Databricks tool or SDK supports it, instead of Google Cloud ID authentication. This is because OAuth M2M authentication can be easier to set up than Google Cloud ID authentication.

  • For additional technical details, see Google Cloud ID authentication.

Databricks personal access token authentication

  • Databricks personal access token authentication uses Databricks users or service principals for authentication.

  • Databricks personal access token authentication uses short-lived or long-lived strings for authentication credentials. These access tokens can be set to expire in as short as one day or less, or they can be set to never expire.

  • Expired Databricks personal access tokens cannot be refreshed.

  • Databricks recommends OAuth M2M authentication, if your target Databricks tool or SDK supports it, instead of Databricks personal access token authentication. This is because OAuth M2M authentication uses Databricks OAuth access tokens, which are more secure than Databricks personal access tokens.

  • For additional technical details, see Databricks personal access token authentication.

Supported authentication types by Databricks tool or SDK

Databricks tools and SDKs that work with one or more supported Databricks authentication types include the following:

Tool or SDK

Supported authentication types

Databricks CLI

For specific Databricks CLI authentication documentation, including how to set up and use Databricks configuration profiles to switch among multiple related authentication settings, see:

For additional technical details about the Databricks CLI, see What is the Databricks CLI?.

Databricks Terraform provider

OAuth machine-to-machine (M2M) authentication and OAuth user-to-machine (U2M) authentication are not yet supported.

For specific Databricks Terraform provider authentication documentation, including how to store and use credentials through environment variables, Databricks configuration profiles, .tfvars files, or secret stores such as Hashicorp Vault, see Authentication.

For additional technical details about the Databricks Terraform provider, see Databricks Terraform provider.

Databricks Connect

For specific Databricks Connect authentication documentation, see:

For additional technical details about Databricks Connect, see What is Databricks Connect?.

Databricks extension for Visual Studio Code

For specific Databricks extension for Visual Studio Code authentication documentation, see Authentication setup for the Databricks extension for Visual Studio Code.

For additional technical details about the Databricks extension for Visual Studio Code, see What is the Databricks extension for Visual Studio Code?.

Databricks SDK for Python

For specific Databricks SDK for Python authentication documentation, see:

For additional technical details about the Databricks SDK for Python, see Databricks SDK for Python.

Databricks SDK for Java

For specific Databricks SDK for Java authentication documentation, see:

For additional technical details about the Databricks SDK for Java, see Databricks SDK for Java.

Databricks SDK for Go

For specific Databricks SDK for Java authentication documentation, see:

For additional technical details about the Databricks SDK for Go, see Databricks SDK for Go.

Other Databricks tools and SDKs

See the tool’s or SDK’s documentation:

Databricks account and workspace REST APIs

Databricks organizes its Databricks REST API into two categories of APIs: account APIs and workspace APIs. Each of these categories requires different sets of information to authenticate the target Databricks identity. Also, each supported Databricks authentication type requires additional information that uniquely identifies the target Databricks identity.

For instance, to authenticate a Databricks identity for calling Databricks account-level API operations, you must provide:

  • The target Databricks account console URL, which is typically https://accounts.gcp.databricks.com.

  • The target Databricks account ID. See Locate your account ID.

  • Information that uniquely identifies the target Databricks identity for the target Databricks authentication type. For the specific information to provide, see the section later in this article for that authentication type.

To authenticate a Databricks identity for calling Databricks workspace-level API operations, you must provide:

  • The target Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com.

  • Information that uniquely identifies the target Databricks identity for the target Databricks authentication type. For the specific information to provide, see the section later in this article for that authentication type.

Databricks client unified authentication

Databricks provides a consolidated and consistent architectural and programmatic approach to authentication, known as Databricks client unified authentication. This approach helps make setting up and automating authentication with Databricks more centralized and predictable. It enables you to configure Databricks authentication once and then use that configuration across multiple Databricks tools and SDKs without further authentication configuration changes.

Participating Databricks tools and SDKs include:

All participating tools and SDKs accept special environment variables and Databricks configuration profiles for authentication. The Databricks Terraform provider and the Databricks SDKs for Python, Java, and Go also accept direct configuration of authentication settings within code. For details, see Supported authentication types by Databricks tool or SDK or the tool’s or SDK’s documentation.

Default order of evaluation for client unified authentication methods and credentials

Whenever a participating tool or SDK needs to authenticate with Databricks, the tool or SDK tries the following types of authentication in the following order by default. When the tool or SDK succeeds with the type of authentication that it tries, the tool or SDK stops trying to authenticate with the remaining authentication types. To force an SDK to authenticate with a specific authentication type, set the Config API’s Databricks authentication type field.

  1. Databricks personal access token authentication

  2. OAuth machine-to-machine (M2M) authentication

  3. OAuth user-to-machine (U2M) authentication

  4. Google Cloud credentials authentication

  5. Google Cloud ID authentication

For each authentication type that the participating tool or SDK tries, the tool or SDK tries to find authentication credentials in the following locations, in the following order. When the tool or SDK succeeds in finding authentication credentials that can be used, the tool or SDK stops trying to find authentication credentials in the remaining locations.

  1. Credential-related Config API fields (for SDKs). To set Config fields, see Supported authentication types by Databricks tool or SDK or the SDK’s reference documentation.

  2. Credential-related environment variables. To set environment variables, see Supported authentication types by Databricks tool or SDK and your operating system’s documentation.

  3. Credential-related fields in the DEFAULT configuration profile within the .databrickscfg file. To set configuration profile fields, see (#auth-types-tools-sdks) and Databricks configuration profiles.

  4. Any related authentication credentials that are cached by the Google Cloud CLI. See Google Cloud CLI.

To provide maximum portability for your code, Databricks recommends that you create a custom configuration profile within the .databrickscfg file, add the required fields for your target Databricks authentication type to the custom configuration profile, and then set the DATABRICKS_CONFIG_PROFILE environment variable to the name of the custom configuration profile. For more information, see Supported authentication types by Databricks tool or SDK.

Environment variables and fields for client unified authentication

The following tables list the names and descriptions of the supported environment variables and fields for Databricks client unified authentication. In the following tables:

General host, token, and account ID environment variables and fields

Common name

Description

Environment variable

.databrickscfg field, Terraform field

Config field

Databricks host

(String) The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint.

DATABRICKS_HOST

host

host (Python), setHost (Java), Host (Go)

Databricks token

(String) The Databricks personal access token.

DATABRICKS_TOKEN

token

token (Python), setToken (Java), Token (Go)

Databricks account ID

(String) The Databricks account ID for the Databricks account endpoint. Only has effect when the Databricks host is also set to https://accounts.gcp.databricks.com.

DATABRICKS_ACCOUNT_ID

account_id

account_id (Python), setAccountID (Java), AccountID (Go)

Google Cloud-specific environment variables and fields

Common name

Description

Environment variable

.databrickscfg field, Terraform field

Config field

Client ID

(String) The Databricks service principal’s client ID.

DATABRICKS_CLIENT_ID

client_id

client_id (Python), setClientId (Java), ClientId (Go)

Client secret

(String) The Databricks service principal’s client secret.

DATABRICKS_CLIENT_SECRET

client_secret

client_secret (Python), setClientSecret (Java), ClientSecret (Go)

Google Cloud service account

(String) The Google Cloud service account’s e-mail address.

DATABRICKS_GOOGLE_SERVICE_ACCOUNT

google_service_acccount

GoogleServiceAccount (Go)

Google Cloud credentials

(String) The local path to the Google Cloud service account key file, or the contents of the service account key file in JSON format.

GOOGLE_CREDENTIALS

google_credentials

GoogleCredentials (Go)

.databrickscfg-specific environment variables and fields

Use these environment variables or fields to specify non-default settings for .databrickscfg. See also Databricks configuration profiles.

Common name

Description

Environment variable

Terraform field

Config field

.databrickscfg file path

(String) A non-default path to the .databrickscfg file.

DATABRICKS_CONFIG_FILE

config_file

config_file (Python), setConfigFile (Java), ConfigFile (Go)

.databrickscfg default profile

(String) The default named profile to use, other than DEFAULT.

DATABRICKS_CONFIG_PROFILE

profile

profile (Python), setProfile (Java), Profile (Go)

Authentication type field

Use this environment variable or field to force an SDK to use a specific type of Databricks authentication.

Common name

Description

Terraform field

Config field

Databricks authentication type

(String) When multiple authentication attributes are available in the environment, use the authentication type specified by this argument.

auth_type

auth_type (Python), setAuthType (Java), AuthType (Go)

Supported Databricks authentication type field values include:

Databricks configuration profiles

A Databricks configuration profile (sometimes referred to as a configuration profile, a config profile, or simply a profile) contains settings and other information that Databricks needs to authenticate. Databricks configuration profiles are stored in Databricks configuration profiles files for your tools, SDKs, scripts, and apps to use. To learn whether Databricks configuration profiles are supported by your tools, SDKs, scripts, and apps, see your provider’s documentation. All participating tools and SDKs that implement Databricks client unified authentication support Databricks configuration profiles. For more information, see Supported authentication types by Databricks tool or SDK.

To create a Databricks configuration profiles file:

  1. Use your favorite text editor to create a file named .databrickscfg in your ~ (your user home) folder on Unix, Linux, or macOS, or your %USERPROFILE% (your user home) folder on Windows, if you do not already have one. Do not forget the dot (.) at the beginning of the file name. Add the following contents to this file:

    [<some-unique-name-for-this-configuration-profile>]
    <field-name> = <field-value>
    
  2. In the preceding contents, replace the following values, and then save the file:

    • <some-unique-name-for-this-configuration-profile> with a unique name for the configuration profile, such as DEFAULT, DEVELOPMENT, PRODUCTION, or similar. You can have multiple configuration profiles in the same .databrickscfg file, but each configuration profile must have a unique name within this file.

    • <field-name> and <field-value> with the name and a value for one of the required fields for the target Databricks authentication type. For the specific information to provide, see the section earlier in this article for that authentication type.

    • Add a <field-name> and <field-value> pair for each of the additional required fields for the target Databricks authentication type.

For example, for Databricks personal access token authentication, the .databrickscfg file might look like this:

[DEFAULT]
host  = https://1234567890123456.7.gcp.databricks.com
token = dapi123...

To create additional configuration profiles, specify different profile names within the same .databrickscfg file. For example, to specify separate Databricks workspaces, each with their own Databricks personal access token:

[DEFAULT]
host  = https://1234567890123456.7.gcp.databricks.com
token = dapi123...

[DEVELOPMENT]
host  = https://2345678901234567.8.gcp.databricks.com
token = dapi234...

You can also specify different profile names within the .databrickscfg file for Databricks accounts and different Databricks authentication types, for example:

[DEFAULT]
host = https://1234567890123456.7.gcp.databricks.com
token = dapi123...

[DEVELOPMENT]
host                   = https://2345678901234567.8.gcp.databricks.com
google_service_account = someone@example.com

ODBC DSNs

In ODBC, a data source name (DSN) is a symbolic name that tools, SDKs, scripts, and apps use to request a connection to an ODBC data source. A DSN stores connection details such as the path to an ODBC driver, networking details, authentication credentials, and database details. To learn whether ODBC DSNs are supported by your tools, scripts, and apps, see your provider’s documentation.

To install and configure the Databricks ODBC Driver and create an ODBC DSN for Databricks, see Databricks ODBC Driver.

JDBC connection URLs

In JDBC, a connection URL is a symbolic URL that tools, SDKs, scripts, and apps use to request a connection to a JDBC data source. A connection URL stores connection details such as networking details, authentication credentials, database details, and JDBC driver capabilities. To learn whether JDBC connection URLs are supported by your tools, SDKs, scripts, and apps, see your provider’s documentation.

To install and configure the Databricks JDBC Driver and create a JDBC connection URL for Databricks, see Databricks JDBC Driver.

Google Cloud CLI

The Google Cloud CLI enables you to authenticate with Databricks on Google Cloud through your terminal for Linux or macOS, or through PowerShell or your Command Prompt for Windows. To learn whether the Google Cloud CLI is supported by your tools, SDKs, scripts, and apps, see Supported authentication types by Databricks tool or SDK or your provider’s documentation.

To use the Google Cloud CLI to authenticate with Databricks on Google Cloud manually, run the gcloud init command, and then follow the on-screen prompts:

gcloud init

For more detailed authentication options, see Initializing the gcloud CLI.

Note that tools and SDKs that implement the Databricks client unified authentication standard and that rely on the Google Cloud CLI should run the Google Cloud CLI automatically on your behalf to create and manage Databricks authentication.