Set up and use Google Cloud credentials authentication for Databricks automation

Follow this article’s steps to authenticate Google Cloud service accounts to automate your Databricks accounts and workspaces.

Google Cloud service accounts are a special kind of Google Cloud account typically used by an application, rather than a person. A service account is identified by its email address, which is unique to the account. See Service accounts overview.

Note

Google Cloud service accounts are different than Databricks service principals. Choosing whether to use a Google Cloud service account or a Databricks service principal might depend on your organization’s security preferences or policies. To learn how to use Databricks service principals for Databricks authentication instead of Google Cloud service accounts, see Manage service principals.

Databricks provides two approaches to authenticating Google Cloud service accounts with Databricks:

This article demonstrates how to set up and use Google Cloud credentials authentication as follows:

  • Create a Google Cloud service account.

  • Assign your Google Cloud service account to your Databricks account and to a Databricks workspace in that account.

  • Create a Google-managed key pair for your Google Cloud service account, and then download the private key portion of this Google-managed key pair. This private key file is required for Google Cloud credentials authentication for Databricks.

  • Install the Databricks CLI on your local development machine and then configure the Databricks CLI for Google Cloud credentials authentication.

  • Run commands with the Databricks CLI to automate your Databricks account and workspace by using Google Cloud credentials authentication.

Requirements

Step 1: Create a Google Cloud service account

In this step, you create a Google Cloud service account for your target Google project in the Google Cloud console.

  1. Sign in to the Google Cloud console.

  2. If you have access to multiple projects, switch to the target project. To do this, in the top navigation bar, next to the Google Cloud logo, click the project selector. Then select the project’s name in the list.

  3. In Search (/) for resources, docs, products, and more, search for and select Service Accounts.

  4. Click + Create Service Account.

  5. In the Service account details section, for Service account name, enter some unique name for the service account that’s easy for you to remember.

  6. Make a note of the Email address below the Service account ID box, as you will need it in Steps 2, 3, 4, and 6. It will look something like the following:

    <your-service-account-name>@<your-project-name>.iam.gserviceaccount.com
    
  7. Optionally, for Service account description, enter some meaningful description about the service account.

  8. Click Create and continue.

  9. Click Done.

Step 2: Assign your Google Cloud service account to your Databricks account

In this step, you give your Google Cloud service account access to your Databricks account. If you do not want to give your service account access to your Databricks account, skip ahead to Step 3.

  1. In your Databricks workspace, click your username in the top bar and click Manage account.

    Alternatively, go directly to your Databricks account console, at https://accounts.gcp.databricks.com.

  2. Sign in to your Databricks account, if prompted.

  3. On the sidebar, click User management.

  4. Click the Users tab.

    Note

    Although this tab is labeled Users, this tab works with service accounts as well. Databricks treats service accounts as users in your Databricks account.

  5. Click Add user.

  6. For Email, enter the Email address that you copied from Step 1 for your service account.

  7. For First name and Last name, enter some meaningful text to help you search for the service account later. For example, for First name you could enter the Service account name from Step 1. For Last name, you could enter Google Cloud Service Account.

  8. Click Add user. Databricks adds the service account as a user to your Databricks account.

  9. Assign any account-level permissions that you want the user to have:

    1. On the Users tab, click the name of the user. If the username is not visible, use Filter users to find it.

    2. On the Roles tab, toggle to enable or disable each target role that you want this user to have. See Assign account admin roles to a user.

Step 3: Assign your Google Cloud service account to your Databricks workspace

In this step, you give your Google Cloud service account access to your Databricks workspace.

If your workspace is enabled for identity federation:

  1. In your Databricks workspace, click your username in the top bar and click Admin Settings.

  2. Click Users.

    Note

    Although this tab is labelled Users, this tab works with service accounts as well. Databricks treats service accounts as users in your Databricks workspace.

  3. Click Add user.

  4. Select the user from Step 2 and click Add. The service account is added as a user in your Databricks workspace.

  5. Assign any workspace-level permissions that you want the user to have:

    1. On the Users tab, click the name of the user.

    2. On the Entitlements tab, select or clear to grant or revoke each target status or entitlement that you want this user to have. For more information, see:

Skip ahead to Step 4.

If your workspace is not enabled for identity federation:

  1. In your Databricks workspace, click your username in the top bar and click Admin Settings.

  2. Click Users.

    Note

    Although this tab is labelled Users, this tab works with service accounts as well. Databricks treats service accounts as users in your Databricks workspace.

  3. Click Add new.

  4. For New user email, enter the Email address that you copied from Step 1 for your service account.

  5. Click Add. The service account is added as a user in your Databricks workspace.

  6. Assign any workspace-level permissions that you want the user to have:

    1. On the Users tab, click the name of the user.

    2. On the Entitlements tab, select or clear to grant or revoke each target status or entitlement that you want this user to have. For more information, see:

Step 4: Create a Google-managed key pair for your Google Cloud service account

In this step, you Create a Google-managed key pair for your Google Cloud service account in the Google Cloud console. You then download the private key portion of this Google-managed key pair.

  1. In the Google Cloud console that you signed in to in Step 1, on your service account’s settings page, click the Keys tab.

    To return to your service account’s settings page if you closed it earlier, in Search (/) for resources, docs, products, and more, search for and select your service account’s name.

  2. Click Add Key > Create new key.

  3. In the Create private key dialog, select JSON, and click Create. The private key portion of the Google-managed key pair is download to your local development machine as <your-project-name>-<random-id>.json. Make a note of where this .json file is downloaded, as you will need it later in Step 6.

    Make sure to store this private key in a secure location. If you lose this private key, you can repeat this step to return to your service account’s settings page later to download a replacement private key.

Step 5: Install the Databricks CLI on your local development machine

In this step, you install the Databricks CLI so that you can use it to run commands that automate your Databricks accounts and workspaces.

Tip

You can also use the Databricks Terraform provider or the Databricks SDK for Go along with Google Cloud credentials authentication to automate your Databricks accounts and workspaces by running HCL or Go code. See the Databricks SDK for Go and Google Cloud credentials authentication.

  1. If it is not already installed, install the Databricks CLI as follows:

    Use Homebrew to install the Databricks CLI by running the following two commands:

    brew tap databricks/tap
    brew install databricks
    

    You can use winget, Chocolatey or Windows Subsystem for Linux (WSL) to install the Databricks CLI. If you cannot use winget, Chocolatey, or WSL, you should skip this procedure and use the Command Prompt or PowerShell to install the Databricks CLI from source instead.

    Note

    Installing the Databricks CLI with Chocolatey is Experimental.

    To use winget to install the Databricks CLI, run the following two commands, and then restart your Command Prompt:

    winget search databricks
    winget install Databricks.DatabricksCLI
    

    To use Chocolatey to install the Databricks CLI, run the following command:

    choco install databricks-cli
    

    To use WSL to install the Databricks CLI:

    1. Install curl and zip through WSL. For more information, see your operating system’s documentation.

    2. Use WSL to install the Databricks CLI by running the following command:

      curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
      
  2. Confirm that the Databricks CLI is installed by running the following command, which displays the current version of the installed Databricks CLI. This version should be 0.205.0 or above:

    databricks -v
    

    Note

    If you run databricks but get an error such as command not found: databricks, or if you run databricks -v and a version number of 0.18 or below is listed, this means that your machine cannot find the correct version of the Databricks CLI executable. To fix this, see Verify your CLI installation.

Step 6: Configure the Databricks CLI for Google Cloud credentials authentication

In this step, you set up the Databricks CLI to use Google Cloud credentials authentication for Databricks by using the private key for your Google Cloud service account. To do this, you create a file with a default filename and in a default location that the Databricks CLI expects to find the authentication settings that it needs.

  1. With your favorite text editor, create a local file named .databrickscfg in your user’s home directory, if it does not already exist. For Linux and macOS, your user home directory is ~. For Windows, your user home directory is %USERPROFILE%.

  2. Enter the following content into the .databrickscfg file. In this content, replace the following values:

    • Replace <account-console-url> with your Databricks account console URL, such as https://accounts.gcp.databricks.com.

    • Replace <account-id> with your Databricks account ID. See Locate your account ID.

    • Replace <path-to-google-service-account-credentials-file> with the path to your downloaded private key from Step 4.

    • Replace <workspace-url> with your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com.

    • You can replace the suggested configuration profile names GCP_CREDS_ACCOUNT and GCP_CREDS_WORKSPACE with different configuration profile names if desired. These specific names are not required.

    If you do not want to run account-level operations, you can omit the [GCP_CREDS_ACCOUNT] section in the following content.

    [GCP_CREDS_ACCOUNT]
    host               = <account-console-url>
    account_id         = <account-id>
    google_credentials = <path-to-google-service-account-credentials-file>
    
    [GCP_CREDS_WORKSPACE]
    host               = <workspace-url>
    google_credentials = <path-to-google-service-account-credentials-file>
    

Step 7: Run an account-level command with the Databricks CLI

In this step, you use the Databricks CLI and Google Cloud credentials authentication to run a command that automates the Databricks account that was configured in Step 6.

If you do not want to run account-level commands, skip ahead to Step 8.

With the terminal or command prompt still open from Step 5, run the following command to list all available users in your Databricks account. If you renamed GCP_CREDS_ACCOUNT in Step 6, be sure to replace it here.

databricks account users list -p GCP_CREDS_ACCOUNT

Step 8: Run a workspace-level command with the Databricks CLI

In this step, you use the Databricks CLI and Google Cloud credentials authentication to run a command that automates the Databricks workspace that was configured in Step 6.

With the terminal or command prompt still open from Step 5, run the following command to list all available users in your Databricks workspace. If you renamed GCP_CREDS_WORKSPACE in Step 6, be sure to replace it here.

databricks account users list -p GCP_CREDS_ACCOUNT

Step 9: Clean up

This step is optional. If you no longer want to keep using the Google Cloud service account that you created for this article, this step describes how to delete the service account from your Google project and your Databricks account and workspace.

Delete the service account from your Google project

  1. In the Google Cloud console that you signed in to from Step 1, in Search (/) for resources, docs, products, and more, search for and select Service Accounts.

  2. In the row for your service account’s name, click the ellipses. If your service account’s name is not visible, use Enter property name or value to find it.

  3. Click Delete.

  4. In the confirmation dialog, click Delete.

Delete the service account from your Databricks account

  1. In your Databricks account, on the sidebar, click User management.

  2. Click the Users tab.

  3. Click the name of the service account that you added in Step 2. If the service accounts’s name is not visible, use Filter users to find it.

  4. Click the ellipses button, and then click Delete user.

  5. Click Confirm delete.

Delete the service account from your Databricks workspace

  1. In your Databricks workspace, click your username in the top bar and click Admin Settings.

  2. Click the User tab.

  3. Click the name of the service account that you added in Step 3. If the service account’s name is not visible, use Filter users to find it.

  4. Click Remove user.

  5. In the confirmation dialog, click Delete.