Authentication with Google ID tokens

Preview

The workspace-level authentication is in Public Preview. There is a public article for Google ID token authentication just for workspace APIs. See Authenticate to workspace APIs with a Google ID token

This article discusses both workspace-level and account-level authentication. Account-level authentication and the Account API 2.0 are in Private Preview. This is not a public article and is intended only for people in the preview.

To authenticate to Databricks REST APIs, you have two options:

  • Databricks personal access token. You can use these for workspace-level REST APIs only.

  • OpenID Connect (OIDC) token. You can use these for all Databricks REST APIs.

OpenID Connect (OIDC) tokens are an open standard to support authentication. OIDC 1.0 is a simple identity layer on top of the OAuth 2.0 protocol. It allows applications to verify the identity of users based on authentication that is performed by an OAuth authorization server. Applications can also get basic profile information of users from OIDC tokens. OIDC tokens by default expire after one hour.

Important

Databricks REST APIs support only the Google-issued OIDC tokens, which are commonly known as Google ID tokens. To reduce confusion, the rest of this article uses the term Google ID token not OIDC token.

This article describes the steps to authenticate to use Databricks REST APIs and how to create the required Google Cloud service accounts and generate tokens for these accounts.

A single Google ID token can be used for account-level APIs or workspace-level APIs, but cannot be used for both purposes. The steps for setting up tokens for workspace-level and account-level APIs are mostly the same, and the important differences are called out in the instructions.

For a production environment, Databricks recommends that you use two service accounts to work with Databricks REST APIs.

  • Create one service account (SA-1) to run your workloads.

  • Create another service account (SA-2) to hold permissions to your Databricks and Google Cloud resources.

  • Grant SA-1 permission to impersonate SA-2 to call Databricks REST APIs.

With this impersonation model, one team can manage workload security and another team can manage resource security. Because you only grant the impersonation permissions as needed, this approach offers security and flexibility to your organization.

This article describes in detail how to perform these steps for production use. You can adapt these instructions for non-production use and testing using one of the following strategies:

  • Use your Google user account to impersonate SA-2. The user account must have the role roles/iam.serviceAccountTokenCreator.

  • Use one Google Cloud service account for both SA-1 and SA-2.

Account-level APIs and workspace-level APIs

To understand authentication to use Databricks REST APIs, you must understand the types of REST APIs and their relationship to the Databricks resource hierarchy.

You may have one or more Databricks accounts. A Databricks account contains zero, one, or more Databricks workspaces. You can use the account console to create workspaces and manage cloud resources that are necessary to configure a workspace (such as credentials, storage, and networks).

A Databricks workspace has a variety of resources, such as jobs and notebooks. A workspace admin, sometimes just called an admin, can modify settings that are defined at the workspace level, including the settings in the admin console.

Because of this difference, there are two types of APIs:

To authenticate to a Databricks REST API, you need to pass a Google ID token with an audience (aud) that matches the base URL of the API, which is different between these two types of APIs. To call Databricks REST APIs with different base URLs, you must use different Google ID tokens.

Credentials passthrough

A few Databricks REST API methods require credentials passthrough. To call these methods, in addition to the Google ID, you must also pass a Google OAuth access token with the cloud-platform scope in an HTTP header. The Databricks server uses the Google OAuth access token to call Google Cloud APIs on behalf of the caller.

Databricks doesn’t validate or preserve the access token.

Important

To determine whether credentials passthrough is required for an operation, refer to the API documentation for each API operation. These APIs require the X-Databricks-GCP-SA-Access-Token HTTP header in the request.

Step 1: Create two service accounts

  1. Create two new Google Cloud service accounts. Follow the instructions in the Google article Creating a service account. To use the Google Cloud Console, go to the Service Accounts page and choose a Google Cloud project to create it in. The Google Cloud project in which you create these service accounts does not need to match the project that you use for Databricks workspace, nor do the new service accounts need to use the same Google Cloud project as each other.

    • Token-creating service account (SA-1): This service account automates creation of tokens for the main service account. These tokens will be used to call Databricks REST APIs. Google documentation calls this SA-1.

    • Main service account for Databricks REST APIs (SA-2): This service account acts as a principal (the automation user) for Databricks REST APIs and automated workflows. Google documentation calls this SA-2.

    Save the email address for both service accounts for use in later steps.

  2. Create a service account key for your token-creating service account (SA-1) and save it to a local file called SA-1-key.json.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-1.

    2. Click the KEYS tab.

    3. Click ADD KEY.

    4. Ensure that JSON (the default) is selected.

    5. Click CREATE.

    6. The web page downloads a key file to your browser. Move that file to your local working directory and rename it SA-1-key.json.

    For additional instructions, see the Google article Creating service account keys.

  3. Grant your token-creating service account (SA-1) the Service Account Token Creator Role on your main service account (SA-2). Follow the instructions in the Google article Direct request permissions.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-2.

    Important

    In Google Cloud Console, be sure to edit your main SA (SA-2), not your token-creating SA (SA-1):

    1. Click PERMISSIONS.

    2. Click GRANT ACCESS.

    3. In the New Principals field, paste the email address for your token-creating SA (SA-1).

    4. In the Role field, choose Service Account Token Creator Role.

    5. Click SAVE.

Step 2: Create a Google ID token

Databricks recommends using the Google Cloud CLI (gcloud) to generate ID tokens to call Databricks REST APIs.

Important

The generated ID token expires in one hour. You must finish all remaining steps within that time. If the token expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google ID token.

  1. Install the Google Cloud CLI on your machine. See the Google article on installing the gcloud tool.

  2. Generate ID tokens for your main service account by running the following commands.

    • Replace <SA-1-key-json> with the path to your SA-1 key file in JSON format.

    • Replace <SA-2-email> with SA-2’s email address.

    • Replace <audience> as follows based on your use case:

      • For workspace-level APIs, replace with your workspace URL, which has the form https://<numbers>.<digit>.gcp.databricks.com, for example https://999999999992360.0.gcp.databricks.com. Every workspace has a different unique workspace URL. To call APIs on multiple workspaces, you need to create multiple Google ID tokens, each with different audience values.

      • For account-level API, replace with https://accounts.gcp.databricks.com. Different accounts all share the same audience value.

    Run the following commands for use with production systems:

    gcloud auth login --cred-file=<SA-1-key-json>
    
    gcloud auth print-identity-token --impersonate-service-account="<SA-2-email-address>" --include-email --audiences="<audience>"
    

    For non-production use, if you use your user account to impersonate SA-2, use these commands:

    gcloud auth login
    
    gcloud auth print-identity-token --impersonate-service-account=<SA-2-email-address> --audiences="<audience>" --include-email
    

    For non-production use, if you use one service account for both SA-1 and SA-2, use these commands with the service account’s key JSON file:

    gcloud auth login --cred-file=<SA-key-json>
    
    gcloud auth print-identity-token --audiences="<audience>"
    
  3. Save the long line at the end of the output to a file called google-id-token-sa-2.txt.

    It outputs text similar to the following:

    WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email-address>].
    
    eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA
    

Step 3: Create a Google OAuth access token (only for APIs that require credentials passthrough)

.. note: This step is required only to call APIs that require credentials passthrough. To determine whether credentials passthrough is required for an operation, refer to the API documentation for each API operation.

The request to generate an access token includes a lifetime field that defines how long the access token is valid. If you only need the token to be active for five minutes, set to 300s (300 seconds). The following example uses 3600s, which represents one hour.

Important

  • You must finish all remaining steps within that time limit. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google OAuth access token.

  • By default, an hour (3600s) is the maximum duration you can set for the lifetime field. To extend this limit, contact Google customer support and request an exception.

  1. Run the following command. Replace <SA-2-email-address> with the service account email address for SA-2. For non-production use or testing, if you are using a single service account or using a user account to impersonate a service account, replace <SA-2-email-address> with the email address for the service account.

    gcloud auth print-access-token --impersonate-service-account=<SA-2-email-address>
    
  2. Save the long line at the end of the output to a file called access-token-sa-2.txt.

    It outputs text similar to the following:

    WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email-address>].
    
    eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA
    

Step 4: Add the service account as a workspace or account user

You can use Google ID tokens to call Databricks account-level APIs like the Account API or workspace-level APIs. The instructions are different based on the use case. Note that you cannot use one Google ID token to access both types of APIs because of the difference in the audience field when creating the Google ID token.

Workspace APIs

To authenticate workspace APIs with the Google ID token, use the workspace admin console to add your main service account (SA-2) as if it were a user email address.

  1. As a workspace admin, go to the admin console.

  2. Follow the instructions in Add users to a workspace and use your main service account’s email address when prompted to provide it in the admin console.

  3. Optionally add any group memberships that might be required for your new service account based on which Databricks REST APIs you plan to call and the data objects that you want to use. See Manage groups.

  4. Optionally add Databricks access control settings for that user. See Enable access control.

Account-level APIs

To authenticate account-level APIs (such the Account API) with the Google ID token, use the account console to add your main service account (SA-2) as an account admin. Add the service account using its email address like you would do to add a user.

  1. As an account owner or account admin, go to the Users tab in the account console.

  2. Click Add user.

    Note

    Do not click Add service principal. You cannot use a service account to create a Databricks service principal.

  3. In the Email address field, enter your main service account (SA-2) email address.

  4. Set the required fields for first and last name in a way that reflects the purpose of the service account.

  5. Click Send invite. Because you used a service account and not a real user email, no one receives the invitation email. The service account is authorized as an account admin immediately without requiring additional confirmation.

Step 5: Call a Databricks API

The tokens you need to provide during REST API authentication varies on your planned usage: either Account API or Workspace-level APIs. Note that you cannot use one Google ID token to access both types of APIs because of the difference in the audiences field when creating the Google ID token.

The following HTTP headers are used for Databricks authentication with Google IDs.

HTTP header name

Description

Which types of APIs require it?

Authorization

Google ID token for SA-2 as a bearer token. Syntax is Authentication: Bearer <token>.

All APIs

X-Databricks-GCP-SA-Access-Token

Google OAuth access token for SA-2.

Account-level APIs only

Example workspace-level API request

To call a Databricks REST API for a workspace, pass a Google ID token in the Authorization HTTP header with the following syntax:

Authorization: Bearer <google-id-token>

The token you provide must have the following attributes:

  • The workspace you access must match the workspace URL that you provided when you created the token. See Step 2: Create a Google ID token.

  • The service account that is impersonated (SA-2) must be a user of the workspace. See Workspace APIs.

The following example calls a workspace-level API to list clusters.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.

  • Replace <workspace-URL> with your base workspace URL, which has the form similar to https://1234567890123456.7.gcp.databricks.com.

curl \
  -X GET \
  --header 'Authorization: Bearer <google-id-token>' \
  <workspace-URL>/api/2.0/clusters/list

Example account-level API request for an API that doesn’t use credential passthrough

The following example calls the Account API to get a list of workspaces.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.

  • Replace <account-id> with your account ID. To find your account ID:

    1. As an account admin, go to the Databricks account console.

    2. At the bottom of the left menu (you might need to scroll), click on the User button (the person icon).

    3. In the popup that appears, copy the account ID by clicking the icon to the right of the ID.

    Find your account ID.
curl \
  -X GET \
  --header 'Authorization: Bearer <google-id-token>' \
  https://accounts.gcp.databricks.com/api/2.0/example/<account-id>/operation-name

Example account-level API request with credential passthrough

The following example calls the Account API to get a list of workspaces.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.

  • Replace <access-token-sa-2> with the SA-2 access token that you saved in file access-token-sa-2.txt.

  • Replace <account-id> with your account ID. To find your account ID:

    1. As an account admin, go to the Databricks account console.

    2. At the bottom of the left menu (you might need to scroll), click on the User button (the person icon).

    3. In the popup that appears, copy the account ID by clicking the icon to the right of the ID.

    Find your account ID.
curl \
  -X DELETE \
  --header 'Authorization: Bearer <google-id-token>' \
  --header 'X-Databricks-GCP-SA-Access-Token: <access-token-sa-2>' \
  https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces/<workspace-id>