Authentication with Google ID tokens

Preview

This feature is in Private Preview.

To authenticate to and access Databricks REST APIs, you have two options:

  • Databricks personal access token to access a workspace. See Authentication using Databricks personal access tokens. You can use these for workspace-level REST APIs only.

  • Open ID Connect (OIDC) token to access any REST API. For Account API 2.0, OIDC token authentication is the only supported authentication type. Only for the Account API 2.0, you also must create and provide a secondary type of token called a Google Cloud OAuth access token with each request.

OpenID Connect (OIDC) tokens are an open standard to support authentication. OIDC 1.0 is a simple identity layer on top of the OAuth 2.0 protocol. It allows clients to verify the identity of users based on authentication that is performed by an authorization server. It also gets basic profile information about the user in an interoperable and REST-like manner. OIDC tokens by default have a one hour expiry. On Google Cloud, Databricks REST APIs only support Google-issued OIDC tokens, commonly known as Google ID tokens.

For account-level APIs such as Account API 2.0, you need two tokens to make an API request: a Google ID token and a Google OAuth access token with the cloud-platform scope. The Google OAuth access token is a Google-issued authorization token that allows the Account API to access your Google Cloud resources on your behalf, such as creating a GKE cluster to run your workspace.

This article describes the steps to authenticate to Databricks REST APIs using Google ID tokens and Google OAuth access tokens and how to create the required Google Cloud service accounts and generate tokens for these accounts. A single Google ID token can be used for account-level APIs or workspace-level APIs, but cannot be used for both purposes. The steps for setting up tokens for workspace-level and account-level APIs are the same for most steps, and the important differences are called out in the instructions.

For a production environment, Databricks strongly recommends that you use two service accounts to work with Databricks REST APIs.

  • Create one service account (SA-1) to run your workloads.

  • Create another service account (SA-2) to hold permissions to your Databricks and Google Cloud resources.

  • Grant SA-1 permission to impersonate SA-2 to call Databricks REST APIs.

With the impersonation model, one team can manage your workload security, and another can manage your resource security. Because you only grant the impersonation permissions as needed, this approach offers security and flexibility for your organization.

This article describes how to perform these steps for production use, but you can adapt these instructions for non-production use. For non-production prototypes and testing, you can use your Google user account to impersonate SA-2 or use one service account for both SA-1 and SA-2.

Step 1: Create and configure two service accounts

  1. Create two new Google Cloud service accounts. Follow the instructions in the Google article Creating a service account. To use the Google Cloud Console, go to the Service Accounts page and choose a Google Cloud project to create it in. The Google Cloud project in which you create these service accounts does not need to match the project that you use for Databricks workspace, nor do the new service accounts need to use the same Google Cloud project as each other.

    • Token-creating service account (SA-1): This service account automates creation of tokens for the main service account. These tokens will be used to call Databricks REST APIs. Google documentation calls this SA-1.

    • Main service account for Databricks REST APIs (SA-2): This service account acts as a principal (the automation user) for Databricks REST APIs and automated workflows. Google documentation calls this SA-2.

    Save the email address for both service accounts for use in later steps.

  2. Create a service account key for your token-creating service account (SA-1) and save it to a local file called SA-1-key.json.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-1.

    2. Click the KEYS tab.

    3. Click ADD KEY.

    4. Ensure that JSON (the default) is selected.

    5. Click CREATE.

    6. The web page downloads a key file to your browser. Move that file to your local working directory and rename it SA-1-key.json.

    For additional instructions, see the Google article Creating service account keys.

  3. Grant your token-creating service account (SA-1) the Service Account Token Creator Role on your main service account (SA-2). Follow the instructions in the Google article Direct request permissions.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-2.

    Important

    In Google Cloud Console, be sure to edit your main SA (SA-2), not your token-creating SA (SA-1):

    1. Click PERMISSIONS.

    2. Click GRANT ACCESS.

    3. In the New Principals field, paste the email address for your token-creating SA (SA-1).

    4. In the Role field, choose Service Account Token Creator Role or any role that is a superset of this role.

    5. Click SAVE.

Step 2: Create a JWT token for your token-creating service account (SA-1)

You must now use the key JSON file that you created in Step 1 to create a JWT token that represents your token-creating service account (SA-1).

These instructions use a Python program on your local system to generate the JWT token. Databricks recommends using Python 3. This example requires the pip tool.

  1. If you do not already have PyJWT installed, run the following command:

    python -m pip install PyJWT
    
  2. Copy the following Python code to your local working directory as a file named python create-jwt.py.

    import jwt
    import time
    
    import json
    
    # CONFIGURATION
    
    # Your service account SA-1 email address
    my_SA = '<SA-1-email-address>'
    
    # Full path to your JSON if it is not 'SA-1-key.json' in current directory
    my_key_json_path = 'SA-1-key.json'
    
    # Duration in seconds for this JWT before expiry.
    # Because we use this to call a Google API, the limit is one hour (3600 seconds).
    duration_seconds = 3600
    
    
    
    # IMPLEMENTATION
    
    sa_secret = json.load(open(my_key_json_path))
    
    iat = time.time()
    exp = iat + duration_seconds
    payload = {
        'iss': my_SA,
        'sub': my_SA,
        'iat': iat,
        'exp': exp,
        'scope': 'https://www.googleapis.com/auth/cloud-platform'
    }
    additional_headers = {'kid': sa_secret['private_key_id']}
    signed_jwt = jwt.encode(payload, sa_secret['private_key'], headers=additional_headers,
                          algorithm='RS256')
    
    print("") # add blank line to separate any warnings or other output from main output
    print(signed_jwt)
    
  3. Modify the code for your configuration:

    • Replace <SA-1-email-address> with your token-creating service account email address.

    • If your SA-1 key JSON file is not named SA-1-key.json in the current directory, change the my_key_json_path assignment to the full path to SA-1-key.json.

  4. Run the program:

    python create-jwt.py
    
  5. Save the long string in the output to a file in your working directory named access-token-sa-1.txt. This is the access token for your token-creating service account (SA-1).

Step 3: Create an OIDC token for your main service account (SA-2)

Use the access token for SA-1 to generate an OIDC token for your main service account (SA-2).

Run the following curl command and make the following changes:

  • Replace <SA-2-email-address> with the SA-2 email address.

  • Replace <SA-1-access-token> with the SA-1 access token from your file access-token-sa-1.txt.

  • Replace <audience> as follows base on what APIs you intend to call:

    • To use the OIDC token with Databricks workspace APIs, use the full HTTPS URL for your Databricks workspace, not including any subpaths. For example https://999999987652360.0.gcp.databricks.com.

    • To use the OIDC token with the Databricks Account API, use the value https://accounts.gcp.databricks.com.

    Important

    Because of the difference in the audience field for different use cases, you cannot use the same OIDC token for both workspace APIs and the Account API. To use OIDC for both types of APIs, create two different OIDC tokens.

  • Set the includeEmail parameter to true.

echo; curl --request POST 'https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/<SA-2-email-address>:generateIdToken' \
--header 'Authorization: Bearer <SA-1-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{
 "audience": "<audience>",
 "includeEmail": "true"
}'

The result looks like:

{
  "token": "<oidc-token-sa-2>"
}

Save the contents of the token field (not the entire JSON) without the quote signs to a file named oidc-token-sa-2.txt.

OIDC tokens by default have a one hour expiry.

Important

You must finish all remaining steps within that timeframe. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google OIDC token.

Step 4: (For Account API only) Create a Google OAuth access token for your main service account (SA-2)

Note

This step is required only to call the Account API 2.0. To call workspace APIs, skip this step.

The request to generate an access token includes a lifetime field that defines how long the access token is valid. If you only need the token to be active for five minutes, set to 300s (300 seconds). The following example uses 3600s, which represents one hour.

Important

  • You must finish all remaining steps within that timeframe. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google OAuth access token.

  • By default, an hour (3600s) is the maximum duration you can set for the lifetime field. To extend this limit, contact Google customer support and request an exception.

  1. Run the following curl command. Replace <SA-2-email-address> with the service account email address for SA-2. Replace <SA-1-access-token> with the access token for SA-1.

    echo; curl --location --request POST \
    'https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/<SA-2-email-address>:generateAccessToken' \
    --header 'Authorization: Bearer <SA-1-access-token>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
    "scope":["https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/compute"],
    "lifetime": "3600s"
    }'
    

    The output looks like:

    {
      "accessToken": "<access-token-sa-2>",
      "expireTime": "2022-02-24T20:55:16Z"
    }
    
  2. Save the contents of the accessToken field (not the entire JSON) to a file called access-token-sa-2.txt.

Step 5: Add your main service account (SA-2) to the account or workspace

You can use OIDC tokens to call Databricks account-level APIs like the Account API or workspace-level APIs. The instructions are different based on the use case. Note that you cannot use one OIDC token to access both types of APIs because of the difference in the audience field when creating the OIDC token.

Allow SA-2 to call the Account API

To call Account APIs with the OIDC token, use the account console to add your main service account (SA-2) as an account admin just as if it were a user:

  1. As an account owner or account admin, go to the Users tab in the account console.

  2. Click Add User.

  3. In the Email address field, enter the main service account (SA-2) email address.

  4. Enter some data in the required first name and last name fields that reflect the purpose of this service account.

  5. Click Send invite. Because you used a service account and not a real user email, there is no actual invitation email. The service account is authorized as an account admin immediately without the need for additional confirmation.

Allow SA-2 to call workspace APIs

To call workspace APIs using the OIDC token, add your main service account (SA-2) to the workspace just as if it were a user:

  1. Follow the instructions in Add users to a workspace and use your main service account’s email address when prompted to provide it in the admin console.

  2. As needed, add any group memberships that might be required for your new service account based on which Databricks REST APIs you plan to call and the data objects that you want to use. See Manage groups.

  3. As needed, add any Databricks access control settings for that user that may be required. See Enable access control.

Step 6: Call a Databricks API

The tokens you need to provide during REST API authentication varies on your planned usage: either Account API or Workspace-level APIs. Note that you cannot use one OIDC token to access both types of APIs because of the difference in the audience field when creating the OIDC token.

The following HTTP headers are used for Databricks authentication.

HTTP header name

Description

Authorization

The service account OIDC token as a bearer token (Authentication: Bearer <token>). Databricks authenticates the request based on the identity in the OIDC token.

X-Databricks-GCP-SA-Access-Token

The Google OAuth access token for SA-2. Databricks needs this token to perform validations and manage IAM roles.

Individual APIs may require different combinations of HTTP headers.

Use case

Add the Authorization header

Add the X-Databricks-GCP-SA-Access-Token header

Account API

Yes

Yes

Workspace-level APIs

Yes

No

Account API example

The following example calls the Account API to get a list of workspaces. Replace <oidc-token> with the OIDC token you saved in file oidc-token-sa-2.txt. Replace <access-token-sa-2> with the SA-2 access token that you saved in file access-token-sa-2.txt.

echo; curl \
  -X GET \
  --header 'Authorization: Bearer <oidc-token>' \
  --header 'X-Databricks-GCP-SA-Access-Token: <access-token-sa-2>' \
  https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces

Workspace-level API example

The following example calls the workspace-level API to list clusters. Replace <oidc-token> with the OIDC token you saved in file oidc-token-sa-2.txt.

echo; curl \
  -X GET \
  --header 'Authorization: Bearer <oidc-token>' \
  https://1234567890123456.7.gcp.databricks.com/api/2.0/clusters/list