Set up and use Google Cloud ID authentication for Databricks automation

Follow this article’s steps to authenticate Google Cloud service accounts to automate your Databricks accounts and workspaces.

Google Cloud service accounts are a special kind of Google Cloud account typically used by an application, rather than a person. A service account is identified by its email address, which is unique to the account. See Service accounts overview.

Note

Google Cloud service accounts are different than Databricks service principals. Choosing whether to use a Google Cloud service account or a Databricks service principal might depend on your organization’s security preferences or policies. To learn how to use Databricks service principals for Databricks authentication instead of Google Cloud service accounts, see Manage service principals.

Databricks provides two approaches to authenticating Google Cloud service accounts with Databricks:

This article demonstrates how to set up and use Google Cloud ID authentication as follows:

  • Create a Google Cloud service account.

  • Assign your Google Cloud service account to your Databricks account and to a Databricks workspace in that account.

  • Install the Google Cloud command-line interface (Google Cloud CLI) and then authorize the Google Cloud CLI to use your login to impersonate the Google Cloud service account.

  • Install the Databricks CLI on your local development machine and then configure the Databricks CLI for Google Cloud ID authentication.

  • Run commands with the Databricks CLI to automate your Databricks account and workspace by using Google Cloud ID authentication, or both.

Requirements

Step 1: Create a Google Cloud service account

In this step, you create a Google Cloud service account for your target Google project in the Google Cloud console.

  1. Sign in to the Google Cloud console.

  2. If you have access to multiple projects, switch to the target project. To do this, in the top navigation bar, next to the Google Cloud logo, click the project selector. Then select the project’s name in the list.

  3. In Search (/) for resources, docs, products, and more, search for and select Service Accounts.

  4. Click + Create Service Account.

  5. In the Service account details section, for Service account name, enter some unique name for the service account that’s easy for you to remember.

  6. Make a note of the Email address below the Service account ID box, as you will need it in Steps 2, 3, 4, 5, and 7. It will look something like the following:

    <your-service-account-name>@<your-project-name>.iam.gserviceaccount.com
    
  7. Optionally, for Service account description, enter some meaningful description about the service account.

  8. Click Create and continue.

  9. Click Done.

Step 2: Assign your Google Cloud service account to your Databricks account

In this step, you give your Google Cloud service account access to your Databricks account. If you do not want to give your service account access to your Databricks account, skip ahead to Step 3.

  1. In your Databricks workspace, click your username in the top bar and click Manage account.

    Alternatively, go directly to your Databricks account console, at https://accounts.gcp.databricks.com.

  2. Sign in to your Databricks account, if prompted.

  3. On the sidebar, click User management.

  4. Click the Users tab.

    Note

    Although this tab is labeled Users, this tab works with service accounts as well. Databricks treats service accounts as users in your Databricks account.

  5. Click Add user.

  6. For Email, enter the Email address that you copied from Step 1 for your service account.

  7. For First name and Last name, enter some meaningful text to help you search for the service account later. For example, for First name you could enter the Service account name from Step 1. For Last name, you could enter Google Cloud Service Account.

  8. Click Add user. Databricks adds the service account as a user to your Databricks account.

  9. Assign any account-level permissions that you want the user to have:

    1. On the Users tab, click the name of the user. If the username is not visible, use Filter users to find it.

    2. On the Roles tab, toggle to enable or disable each target role that you want this user to have. See Assign account admin roles to a user.

Step 3: Assign your Google Cloud service account to your Databricks workspace

In this step, you give your Google Cloud service account access to your Databricks workspace.

If your workspace is enabled for identity federation:

  1. In your Databricks workspace, click your username in the top bar and click Admin Settings.

  2. Click Users.

    Note

    Although this tab is labelled Users, this tab works with service accounts as well. Databricks treats service accounts as users in your Databricks workspace.

  3. Click Add user.

  4. Select the user from Step 2 and click Add. The service account is added as a user in your Databricks workspace.

  5. Assign any workspace-level permissions that you want the user to have:

    1. On the Users tab, click the name of the user.

    2. On the Entitlements tab, select or clear to grant or revoke each target status or entitlement that you want this user to have. For more information, see:

Skip ahead to Step 4.

If your workspace is not enabled for identity federation:

  1. In your Databricks workspace, click your username in the top bar and click Admin Settings.

  2. Click Users.

    Note

    Although this tab is labelled Users, this tab works with service accounts as well. Databricks treats service accounts as users in your Databricks workspace.

  3. Click Add new.

  4. For New user email, enter the Email address that you copied from Step 1 for your service account.

  5. Click Add. The service account is added as a user in your Databricks workspace.

  6. Assign any workspace-level permissions that you want the user to have:

    1. On the Users tab, click the name of the user.

    2. On the Entitlements tab, select or clear to grant or revoke each target status or entitlement that you want this user to have. For more information, see:

Step 4: Install the Google Cloud CLI on your local development machine

Install the Google Cloud CLI by following the instructions in Install the gcloud CLI.

Step 5: Impersonate the Google Cloud service account

In this step, you use your Google Cloud login to automate Databricks through your Google Cloud service account, by using a technique called impersonation. For more information see, Service account impersonation.

To impersonate the service account, you must give your Google Cloud user permissions to impersonate service accounts. You then initiate the impersonation through the Google Cloud CLI.

  1. Give your Google Cloud user permissions to impersonate service accounts: in the Google Cloud console that you signed in to from Step 1, in Search (/) for resources, docs, products, and more, search for and select IAM.

  2. On the Permissions tab, in the View By Principals tab, click Grant Access.

  3. For New Principals, enter and select your Google Cloud username. (Do not enter your Google Cloud service account’s name here.)

  4. Click Select a role, and enter and select Service Account Token Creator.

  5. Click Add Another Role.

  6. Click Select a role, and enter and select Service Account User.

  7. Click Service Account Token Creator.

  8. Click Save.

  9. Initiate the impersonation: use the Google Cloud CLI to run the following command, replacing <your-service-account-name>@<your-project-name>.iam.gserviceaccount.com with the Email address that you copied from Step 1 for your service account.

    gcloud auth login --impersonate-service-account=<your-service-account-name>@<your-project-name>.iam.gserviceaccount.com
    
  10. In your web browser, sign in with your Google Cloud user account by following the on-screen sign-in instructions.

Step 6: Install the Databricks CLI on your local development machine

In this step, you install the Databricks CLI so that you can use it to run commands that automate your Databricks accounts and workspaces.

Tip

You can also use the Databricks Terraform provider or the Databricks SDK for Go along with Google Cloud ID authentication to automate your Databricks accounts and workspaces by running HCL or Go code. See the Databricks SDK for Go and Google Cloud ID authentication.

  1. If it is not already installed, install the Databricks CLI as follows:

    Use Homebrew to install the Databricks CLI by running the following two commands:

    brew tap databricks/tap
    brew install databricks
    

    You can use winget, Chocolatey or Windows Subsystem for Linux (WSL) to install the Databricks CLI. If you cannot use winget, Chocolatey, or WSL, you should skip this procedure and use the Command Prompt or PowerShell to install the Databricks CLI from source instead.

    Note

    Installing the Databricks CLI with Chocolatey is Experimental.

    To use winget to install the Databricks CLI, run the following two commands, and then restart your Command Prompt:

    winget search databricks
    winget install Databricks.DatabricksCLI
    

    To use Chocolatey to install the Databricks CLI, run the following command:

    choco install databricks-cli
    

    To use WSL to install the Databricks CLI:

    1. Install curl and zip through WSL. For more information, see your operating system’s documentation.

    2. Use WSL to install the Databricks CLI by running the following command:

      curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
      
  2. Confirm that the Databricks CLI is installed by running the following command, which displays the current version of the installed Databricks CLI. This version should be 0.205.0 or above:

    databricks -v
    

    Note

    If you run databricks but get an error such as command not found: databricks, or if you run databricks -v and a version number of 0.18 or below is listed, this means that your machine cannot find the correct version of the Databricks CLI executable. To fix this, see Verify your CLI installation.

Step 7: Configure the Databricks CLI for Google Cloud ID authentication

In this step, you set up the Databricks CLI to use Google Cloud ID authentication for Databricks by using your Google Cloud service account’s name. To do this, you create a file with a default filename and in a default location that the Databricks CLI expects to find the authentication settings that it needs.

  1. With your favorite text editor, create a local file named .databrickscfg in your user’s home directory, if it does not already exist. For Linux and macOS, your user home directory is ~. For Windows, your user home directory is %USERPROFILE%.

  2. Enter the following content into the .databrickscfg file. In this content, replace the following values:

    • Replace <account-console-url> with your Databricks account console URL, such as https://accounts.gcp.databricks.com.

    • Replace <account-id> with your Databricks account ID. See Locate your account ID.

    • Replace <google-cloud-service-account-email-address> with the Email address that you copied from Step 1 for your service account.

    • Replace <workspace-url> with your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com.

    • You can replace the suggested configuration profile names GCP_ID_ACCOUNT and GCP_ID_WORKSPACE with different configuration profile names if desired. These specific names are not required.

    If you do not want to run account-level operations, you can omit the [GCP_ID_ACCOUNT] section in the following content.

    [GCP_ID_ACCOUNT]
    host                   = <account-console-url>
    account_id             = <account-id>
    google_service_account = <google-cloud-service-account-email-address>
    
    [GCP_ID_WORKSPACE]
    host                   = <workspace-url>
    google_service_account = <google-cloud-service-account-email-address>
    

Step 8: Run an account-level command with the Databricks CLI

In this step, you use the Databricks CLI and Google Cloud ID authentication to run a command that automates the Databricks account that was configured in Step 7. This step assumes that your Google Cloud user account is currently impersonating the service account as described previously in Step 5.

If you do not want to run account-level commands, skip ahead to Step 9.

With the terminal or command prompt still open from Step 6, run the following command to list all available users in your Databricks account. If you renamed GCP_ID_ACCOUNT in Step 7, be sure to replace it here.

databricks account users list -p GCP_ID_ACCOUNT

Step 9: Run a workspace-level command with the Databricks CLI

In this step, you use the Databricks CLI and Google Cloud credentials authentication to run a command that automates the Databricks account that was configured in Step 7. This step assumes that your Google Cloud user account is currently impersonating the service account as described previously in Step 5.

With the terminal or command prompt still open from Step 6, run the following command to list all available users in your Databricks workspace. If you renamed GCP_ID_WORKSPACE in Step 7, be sure to replace it here.

databricks users list -p GCP_ID_WORKSPACE

Step 10: Clean up

This step is optional. If you no longer want to keep using the Google Cloud service account that you created for this article, this step describes how to delete the service account from your Google project and your Databricks account and workspace.

Delete the service account from your Google project

  1. In the Google Cloud console that you signed in to from Step 1, in Search (/) for resources, docs, products, and more, search for and select Service Accounts.

  2. In the row for your service account’s name, click the ellipses. If your service account’s name is not visible, use Enter property name or value to find it.

  3. Click Delete.

  4. In the confirmation dialog, click Delete.

Delete the service account from your Databricks account

  1. In your Databricks account, on the sidebar, click User management.

  2. Click the Users tab.

  3. Click the name of the service account that you added in Step 2. If the service accounts’s name is not visible, use Filter users to find it.

  4. Click the ellipses button, and then click Delete user.

  5. Click Confirm delete.

Delete the service account from your Databricks workspace

  1. In your Databricks workspace, click your username in the top bar and click Admin Settings.

  2. Click the User tab.

  3. Click the name of the service account that you added in Step 3. If the service account’s name is not visible, use Filter users to find it.

  4. Click Remove user.

  5. In the confirmation dialog, click Delete.