Authenticate access to Databricks with a user account using OAuth (OAuth U2M)

Databricks uses OAuth user-to-machine (U2M) authentication to enable CLI and API access to Databricks account and workspace resources on behalf of a user. After a user initially signs in and consents to the OAuth authentication request, an OAuth token is given to the participating tool or SDK to perform token-based authentication on the user’s behalf from that time forward. The OAuth token has a lifespan of one hour, following which the tool or SDK involved will make an automatic background attempt to obtain a new token that is also valid for one hour.

Databricks supports two ways to authenticate access for a user account with OAuth:

  • Mostly automatically, using the Databricks unified client authentication support. Use this simplified approach if you are using specific Databricks SDKs (such as the Databricks Terraform SDK) and tools. Supported tools and SDKs are listed in Databricks unified client authentication.

  • Manually, by directly generating an OAuth code verifier/challenge pair and an authorization code, and using them to create the initial OAuth token you will provide in your configuration. Use this approach when you are not using an API supported by Databricks unified client authentication. For more details see: Manually generate and use access tokens for OAuth user-to-machine (U2M) authentication.

U2M authentication with Databricks unified client authentication

Note

Before you start configuring your authentication, review the ACL permissions for a specific category of operations on workspace objects and determine whether your account has the access level you require. For more details, see Access control lists.

To perform OAuth U2M authentication with Databricks SDKs and tools that support unified client authentication, integrate the following within your code:

To use environment variables for a specific Databricks authentication type with a tool or SDK, see Authenticate access to Databricks resources or the tool’s or SDK’s documentation. See also Environment variables and fields for client unified authentication and the Default methods for client unified authentication.

For account-level operations, set the following environment variables:

  • DATABRICKS_HOST, set to the value of your Databricks account console URL, https://accounts.gcp.databricks.com.

  • DATABRICKS_ACCOUNT_ID

For workspace-level operations, set the following environment variables:

  • DATABRICKS_HOST, set to the value of your Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com.

Create or identify a Databricks configuration profile with the following fields in your .databrickscfg file. If you create the profile, replace the placeholders with the appropriate values. To use the profile with a tool or SDK, see Authenticate access to Databricks resources or the tool’s or SDK’s documentation. See also Environment variables and fields for client unified authentication and the Default methods for client unified authentication.

For account-level operations, set the following values in your .databrickscfg file. In this case, the Databricks account console URL is https://accounts.gcp.databricks.com:

[<some-unique-configuration-profile-name>]
host       = <account-console-url>
account_id = <account-id>

For workspace-level operations, set the following values in your .databrickscfg file. In this case, the host is the Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com:

[<some-unique-configuration-profile-name>]
host = <workspace-url>

For the Databricks CLI, run the databricks auth login command with the following options:

After you run this command, follow the instructions in your web browser to log in to your Databricks account or workspace.

For more details, see OAuth U2M authentication with the Databricks CLI.

Note

OAuth U2M authentication is supported in the following Databricks Connect versions:

  • For Python, Databricks Connect for Databricks Runtime 13.1 and above.

  • For Scala, Databricks Connect for Databricks Runtime 13.3 LTS and above.

For Databricks Connect, you can do one of the following:

  • Set the values in your .databrickscfg file for Databricks workspace-level operations as specified in this article’s “Profile” section. Also set the cluster_id environment variable in your profile to your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com.

  • Set the environment variables for Databricks workspace-level operations as specified in this article’s “Environment” section. Also set the DATABRICKS_CLUSTER_ID environment variable to your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com.

Values in your .databrickscfg file always take precedence over environment variables.

To initialize the Databricks Connect client with these environment variables or values in your .databrickscfg file, see one of the following:

For the Databricks extension for Visual Studio Code, do the following:

  1. In the Configuration pane, click Configure Databricks.

  2. In the Command Palette, for Databricks Host, enter your workspace URL, for example https://1234567890123456.7.gcp.databricks.com, and then press Enter.

  3. Select OAuth (user to machine).

  4. Complete the on-screen instructions within your web browser to finish authenticating with your Databricks account and allowing all-apis access.

For more details, see OAuth U2M authentication with the Databricks CLI.

Note

OAuth U2M authentication is not yet supported.

For both account-level and workspace-level operations, you must use the Databricks CLI to run the following command before you run your Python code. This command instructs the Databricks CLI to generate and cache the necessary OAuth token in the path .databricks/token-cache.json within your user’s home folder on your machine:

Configuring for Databricks account-level operations

databricks auth login --host <account-console-url> --account-id <account-id>

Replace the following placeholders:

  • Replace <account-console-url> with the value https://accounts.gcp.databricks.com. (Do not set this to the value of your Databricks workspace URL.)

  • Replace <account-id> with the value of your Databricks account. See Locate your account ID.

Note

If you have an existing Databricks configuration profile with the host and account_id fields already set, you can substitute --host <account-console-url> --account-id <account-id> with --profile <profile-name>.

After you run the auth login command, you are prompted to save the account login URL and account ID as a Databricks configuration profile. When prompted, enter the name of a new or existing profile in your .databrickscfg file. Any existing profile with the same name in your .databrickscfg file is overwritten.

If prompted, complete your web browser’s on-screen instructions to complete the login. Then use Python code similar to one of the following snippets:

For default authentication:

from databricks.sdk import AccountClient

a = AccountClient()
# ...

For direct configuration (replace the retrieve placeholders with your own implementation to retrieve the values from the console or some other configuration store, such as Google Cloud Secret Manager). In this case, the Databricks account console URL is https://accounts.gcp.databricks.com:

from databricks.sdk import AccountClient

a = AccountClient(
  host       = retrieveAccountConsoleUrl(),
  account_id = retrieveAccountId()
)
# ...

Configuring for Databricks workspace-level operations

databricks auth login --host <worskpace-url>

Replace the placeholder <workspace-url> with the target Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com.

Note

If you have an existing Databricks configuration profile with the host field already set, you can substitute --host <workspace-url> with --profile <profile-name>.

After you run the auth login command, you are prompted to save the workspace URL as a Databricks configuration profile. When prompted, enter the name of a new or existing profile in your .databrickscfg file. Any existing profile with the same name in your .databrickscfg file is overwritten.

If prompted, complete your web browser’s on-screen instructions to complete the login. Then use Python code similar to one of the following snippets:

For default authentication:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
# ...

For direct configuration (replace the retrieve placeholders with your own implementation to retrieve the values from the console or some other configuration store, such as Google Cloud Secret Manager). In this case, the host is the Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient(host = retrieveWorkspaceUrl())
# ...

For more information about authenticating with Databricks tools and SDKs that use Python and that implement Databricks client unified authentication, see:

For both account-level and workspace-level operations, you must use the Databricks CLI to run the following command before you run your Java code. This command instructs the Databricks CLI to generate and cache the necessary OAuth token in the path .databricks/token-cache.json in your user’s home folder on your machine:

Configuring for Databricks account-level operations

databricks auth login --host <account-console-url> --account-id <account-id>

Replace the following placeholders:

  • Replace <account-console-url> with the value https://accounts.gcp.databricks.com. (Do not set this to the value of your Databricks workspace URL.)

  • Replace <account-id> with the value of your Databricks account. See Locate your account ID.

Note

If you have an existing Databricks configuration profile with the host and account_id fields already set, you can substitute --host <account-console-url> --account-id <account-id> with --profile <profile-name>.

After you run the auth login command, you are prompted to save the account login URL and account ID as a Databricks configuration profile. When prompted, enter the name of a new or existing profile in your .databrickscfg file. Any existing profile with the same name in your .databrickscfg file is overwritten.

If prompted, complete your web browser’s on-screen instructions to complete the login. Then use Java code similar to one of the following snippets:

For default authentication:

import com.databricks.sdk.AccountClient;
// ...
AccountClient a = new AccountClient();
// ...

For direct configuration (replace the retrieve placeholders with your own implementation to retrieve the values from the console or some other configuration store, such as Google Cloud Secret Manager). In this case, the Databricks account console URL is https://accounts.gcp.databricks.com:

import com.databricks.sdk.AccountClient;
import com.databricks.sdk.core.DatabricksConfig;
// ...
DatabricksConfig cfg = new DatabricksConfig()
  .setHost(retrieveAccountConsoleUrl())
  .setAccountId(retrieveAccountId());
AccountClient a = new AccountClient(cfg);
// ...

Configuring for Databricks workspace-level operations

For workspace-level operations, you should first use the Databricks CLI to run the following command, before you run your Java code. This command instructs the Databricks CLI to generate and cache the necessary OAuth token in the path .databricks/token-cache.json within your user’s home folder on your machine:

databricks auth login --host <worskpace-url>

Replace the placeholder <workspace-url> with the target Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com.

Note

If you have an existing Databricks configuration profile with the host field already set, you can substitute --host <workspace-url> with --profile <profile-name>.

After you run the auth login command, you are prompted to save the workspace URL as a Databricks configuration profile. When prompted, enter the name of a new or existing profile in your .databrickscfg file. Any existing profile with the same name in your .databrickscfg file is overwritten.

If prompted, complete your web browser’s on-screen instructions to complete the login. Then use Java code similar to one of the following snippets:

For default authentication:

import com.databricks.sdk.WorkspaceClient;
// ...
WorkspaceClient w = new WorkspaceClient();
// ...

For direct configuration (replace the retrieve placeholders with your own implementation to retrieve the values from the console or some other configuration store, such as Google Cloud Secret Manager). In this case, the host is the Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com:

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.core.DatabricksConfig;
// ...
DatabricksConfig cfg = new DatabricksConfig()
  .setHost(retrieveWorkspaceUrl())
WorkspaceClient w = new WorkspaceClient(cfg);
// ...

For more information about authenticating with Databricks tools and SDKs that use Java and that implement Databricks client unified authentication, see:

For both account-level and workspace-level operations, you must use the Databricks CLI to run the following command before you run your Go code. This command instructs the Databricks CLI to generate and cache the necessary OAuth token in the path .databricks/token-cache.json within your user’s home folder on your machine:

Configuring for Databricks account-level operations

databricks auth login --host <account-login-url> --account-id <account-id>

Replace the following placeholders:

  • Replace <account-console-url> with the value https://accounts.gcp.databricks.com. (Do not set this to the value of your Databricks workspace URL.)

  • Replace <account-id> with the value of your Databricks account. See Locate your account ID.

Note

If you have an existing Databricks configuration profile with the host and account_id fields already set, you can substitute --host <account-console-url> --account-id <account-id> with --profile <profile-name>.

After you run the auth login command, you are prompted to save the account login URL and account ID as a Databricks configuration profile. When prompted, enter the name of a new or existing profile in your .databrickscfg file. Any existing profile with the same name in your .databrickscfg file is overwritten.

If prompted, complete your web browser’s on-screen instructions to complete the login. Then use Go code similar to one of the following snippets:

For default authentication:

import (
  "github.com/databricks/databricks-sdk-go"
)
// ...
a := databricks.Must(databricks.NewAccountClient())
// ...

For direct configuration (replace the retrieve placeholders with your own implementation to retrieve the values from the console or some other configuration store, such as Google Cloud Secret Manager). In this case, the Databricks account console URL is https://accounts.gcp.databricks.com:

import (
  "github.com/databricks/databricks-sdk-go"
)
// ...
a := databricks.Must(databricks.NewAccountClient(&databricks.Config{
  Host:      retrieveAccountConsoleUrl(),
  AccountId: retrieveAccountId(),
}))
// ...

Configuring for Databricks workspace-level operations

For workspace-level operations, you should first use the Databricks CLI to run the following command, before you run your Go code. This command instructs the Databricks CLI to generate and cache the necessary OAuth token in the path .databricks/token-cache.json within your user’s home folder on your machine:

databricks auth login --host <worskpace-url>

Replace the placeholder <workspace-url> with the target Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com.

Note

If you have an existing Databricks configuration profile with the host field already set, you can substitute --host <workspace-url> with --profile <profile-name>.

After you run the auth login command, you are prompted to save the workspace URL as a Databricks configuration profile. When prompted, enter the name of a new or existing profile in your .databrickscfg file. Any existing profile with the same name in your .databrickscfg file is overwritten.

If prompted, complete your web browser’s on-screen instructions to complete the login. Then use Go code similar to one of the following snippets:

For default authentication:

import (
  "github.com/databricks/databricks-sdk-go"
)
// ...
w := databricks.Must(databricks.NewWorkspaceClient())
// ...

For direct configuration (replace the retrieve placeholders with your own implementation to retrieve the values from the console or some other configuration store, such as Google Cloud Secret Manager). In this case, the host is the Databricks workspace URL, for example https://1234567890123456.7.gcp.databricks.com:

import (
  "github.com/databricks/databricks-sdk-go"
)
// ...
w := databricks.Must(databricks.NewWorkspaceClient(&databricks.Config{
  Host: retrieveWorkspaceUrl(),
}))
// ...

For more information about authenticating with Databricks tools and SDKs that use Go and that implement Databricks client unified authentication, see Authenticate the Databricks SDK for Go with your Databricks account or workspace.

Manually generate and use access tokens for OAuth user-to-machine (U2M) authentication

Databricks tools and SDKs that implement the Databricks client unified authentication standard will automatically generate, refresh, and use Databricks OAuth access tokens on your behalf as needed for OAuth U2M authentication.

If for some reason you must manually generate, refresh, or use Databricks OAuth access tokens for OAuth U2M authentication, follow the instructions in this section.

Step 1: Generate an OAuth code verifier and code challenge pair

To manually generate and use access tokens for OAuth U2M authentication, you must first have an OAuth code verifier and an OAuth code challenge that is derived from the code verifier. You use the code challenge in Step 2 to generate an OAuth authorization code. You use the code verifier and the authorization code in Step 3 to generate the OAuth access token.

Note

While it is technically possible to use unencoded plain-text strings for the code verifier and code challenge, Databricks strongly encourages following the OAuth standard for generating the code verifier and code challenge instead.

Specifically, the code verifier should be a cryptographically random string using characters from the sets A-Z, a-z, 0-9, and the punctuation characters -._~ (hyphen, period, underscore, and tilde), between 43 and 128 characters long. The code challenge should be a Base64-URL-encoded string of the SHA256 hash of the code verifier. For more information, see Authorization Request.

You can run the following Python script to quickly generate a unique code verifier and code challenge pair. While you can reuse this generated code verifier and code challenge pair multiple times, Databricks recommends that you generate a new code verifier and code challenge pair each time that you manually generate access tokens for OAuth U2M authentication.

import uuid, hashlib, base64

# Generate a UUID.
uuid1 = uuid.uuid4()

# Convert the UUID to a string.
uuid_str1 = str(uuid1).upper()

# Create the code verifier.
code_verifier = uuid_str1 + "-" + uuid_str1

# Create the code challenge based on the code verifier.
code_challenge = base64.urlsafe_b64encode(hashlib.sha256(code_verifier.encode()).digest()).decode('utf-8')

# Remove all padding from the code challenge.
code_challenge = code_challenge.replace('=', '')

# Print the code verifier and the code challenge.
# Use these in your calls to manually generate
# access tokens for OAuth U2M authentication.
print(f"code_verifier:  {code_verifier}")
print(f"code_challenge: {code_challenge}")

Step 2: Generate an authorization code

You use an OAuth authorization code to generate a Databricks OAuth access token. The authorization code expires immediately after you use it to generate a Databricks OAuth access token. The scope of the authorization code depends on the level that you generate it from. You can generate an authorization code at either the Databricks account level or workspace level, as follows:

Generate an account-level authorization code

  1. As an account admin, log in to the account console.

  2. Click the down arrow next to your username in the upper right corner.

  3. Copy your Account ID.

  4. In your web browser’s address bar, browse to the following URL. Line breaks have been added for readability. Your URL must not contain these line breaks.

    In the following URL, replace the following:

    • Replace <account-id> with the Account ID that you copied.

    • Replace <redirect-url> with a redirect URL to your local machine, for example http://localhost:8020.

    • Replace <state> with some plain-text string that you can use to verify the integrity of the authorization code.

    • Replace <code-challenge> with the code challenge that you generated in Step 1.

    https://accounts.gcp.databricks.com/oidc/accounts/<account-id>/v1/authorize
    ?client_id=databricks-cli
    &redirect_uri=<redirect-url>
    &response_type=code
    &state=<state>
    &code_challenge=<code-challenge>
    &code_challenge_method=S256
    &scope=all-apis+offline_access
    
  5. When prompted, follow the on-screen directions to log in to your Databricks account.

  6. In your web browser’s address bar, copy the authorization code. The authorization code is the full string of characters between code= and the & character in the URL. For example, the authorization code in the following URL is dcod...7fe6:

    http://localhost:8020/?code=dcod...7fe6&state=<state>
    

    You should verify the integrity of this authorization code by visually confirming that the <state> value in this response URL matches the state value that you provided in your request URL. If the values are different, you should not use this authorization code, as it could be compromised.

  7. Skip ahead to Generate an account-level access token.

Generate a workspace-level authorization code

  1. In your web browser’s address bar, browse to the following URL. Line breaks have been added for readability. Your URL must not contain these line breaks.

    In the following URL, replace the following:

    • Replace <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

    • Replace <redirect-url> with a redirect URL to your local machine, for example http://localhost:8020.

    • Replace <state> with some plain-text string that you can use to verify the integrity of the authorization code.

    • Replace <code-challenge> with the code challenge that you generated in Step 1.

    https://<databricks-instance>/oidc/v1/authorize
    ?client_id=databricks-cli
    &redirect_uri=<redirect-url>
    &response_type=code
    &state=<state>
    &code_challenge=<code-challenge>
    &code_challenge_method=S256
    &scope=all-apis+offline_access
    
  2. When prompted, follow the on-screen directions to log in to your Databricks workspace.

  3. In your web browser’s address bar, copy the authorization code. The authorization code is the full string of characters between code= and the & character in the URL. For example, the authorization code in the following URL is dcod...7fe6:

    http://localhost:8020/?code=dcod...7fe6&state=<state>
    

    You should verify the integrity of this authorization code by visually confirming that the <state> value in this response URL matches the state value that you provided in your request URL. If the values are different, you should not use this authorization code, as it could be compromised.

Step 3: Use the authorization code to generate an OAuth access token

You use the OAuth authorization code from the previous step to generate a Databricks OAuth access token, as follows:

Generate an account-level access token

  1. Use a client such as curl along with the account-level authorization code to generate the account-level OAuth access token. In the following curl call, replace the following placeholders:

    • Replace <account-id> with the Account ID from Step 2.

    • Replace <redirect-url> with the redirect URL from Step 2.

    • Replace <code-verifier> with the code verifier that you generated in Step 1.

    • Replace <authorization-code> with the account-level authorization code that you generated in Step 2.

    curl --request POST \
    https://accounts.gcp.databricks.com/oidc/accounts/<account-id>/v1/token \
    --data "client_id=databricks-cli" \
    --data "grant_type=authorization_code" \
    --data "scope=all-apis offline_access" \
    --data "redirect_uri=<redirect-url>" \
    --data "code_verifier=<code-verifier>" \
    --data "code=<authorization-code>"
    
  2. In the response, copy the account-level OAuth access token. The access token is the full string of characters in the access_token object. For example, the access token in the following response is eyJr...Dkag:

    {
      "access_token": "eyJr...Dkag",
      "refresh_token": "doau...f26e",
      "scope": "all-apis offline_access",
      "token_type": "Bearer",
      "expires_in": 3600
    }
    

    This access token expires in one hour. To generate a new access token, repeat this procedure from Step 1.

  3. Skip ahead to Step 4: Call a Databricks REST API.

Generate a workspace-level access token

  1. Use a client such as curl along with the workspace-level authorization code to generate the workspace-level OAuth access token. In the following curl call, replace the following placeholders:

    • Replace <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

    • Replace <redirect-url> with the redirect URL from Step 2.

    • Replace <code-verifier> with the code verifier that you generated in Step 1.

    • Replace <authorization-code> with the workspace-level authorization code that you generated in Step 2.

    curl --request POST \
    https://<databricks-instance>/oidc/v1/token \
    --data "client_id=databricks-cli" \
    --data "grant_type=authorization_code" \
    --data "scope=all-apis offline_access" \
    --data "redirect_uri=<redirect-url>" \
    --data "code_verifier=<code-verifier>" \
    --data "code=<authorization-code>"
    
  2. In the response, copy the workspace-level OAuth access token. The access token is the full string of characters in the access_token object. For example, the access token in the following response is eyJr...Dkag:

    {
     "access_token": "eyJr...Dkag",
     "refresh_token": "doau...f26e",
     "scope": "all-apis offline_access",
     "token_type": "Bearer",
     "expires_in": 3600
    }
    

    This access token expires in one hour. To generate a new access token, repeat this procedure from Step 1.

Step 4: Call a Databricks REST API

You use the account-level or workspace-level OAuth access token to authenticate to Databricks account-level REST APIs and workspace-level REST APIs, depending on the access token’s scope. Your Databricks user account must be an account admin to call account-level REST APIs.

Example account-level REST API request

This example uses curl along with Bearer authentication to get a list of all workspaces associated with an account.

  • Replace <oauth-access-token> with the account-level OAuth access token.

  • Replace <account-id> with your account ID.

export OAUTH_TOKEN=<oauth-access-token>

curl --request GET --header "Authorization: Bearer $OAUTH_TOKEN" \
"https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces"

Example workspace-level REST API request

This example uses curl along with Bearer authentication to list all available clusters in the specified workspace.

  • Replace <oauth-access-token> with the account-level or workspace-level OAuth access token.

  • Replace <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

export OAUTH_TOKEN=<oauth-access-token>

curl --request GET --header "Authorization: Bearer $OAUTH_TOKEN" \
"https://<databricks-instance>/api/2.0/clusters/list"