Authentication using Open ID Connect (OIDC) tokens

Preview

This feature is in Private Preview. To try it, reach out to your Databricks contact.

To authenticate to and access Databricks REST APIs, you have two options:

  • Databricks personal access token to access a workspace. See Authentication using Databricks personal access tokens. You can use these for workspace-level REST APIs only.

  • Open ID Connect (OIDC) token to access any REST API. For Account API 2.0, OIDC token authentication is the only supported authentication type.

    Important

    For the Account API 2.0, you also must create and provide a secondary type of token called a Google Cloud OAuth access token. This Google Cloud service token is needed to allow account-level APIs to access Google Cloud resources in your account—for example, to create new GKE clusters in your account. You do not need a Google Cloud OAuth access token to authenticate to Databricks workspace APIs.

OpenID Connect (OIDC) tokens are an open standard to support authentication. OIDC 1.0 is a simple identity layer on top of the OAuth 2.0 protocol. It allows clients to verify the identity of users based on authentication that is performed by an authorization server. It also gets basic profile information about the user in an interoperable and REST-like manner.

OIDC tokens by default have a one hour expiry.

This article describes the steps to authenticate to Databricks REST APIs using OIDC tokens, including how to create the required Google Cloud service accounts and the secondary Google Cloud OAuth access token that you need only for account-level APIs. One token can be used for account-level APIs or workspace-level APIs, but not both. If you need both types of APIs, you must create a different OIDC token for each type. The steps for setting up tokens for workspace-level and account-level APIs are the same for most steps, and the important differences are called out in the instructions.

Step 1: Create and configure two service accounts

  1. Create two new Google Cloud service accounts. Follow the instructions in the Google article Creating a service account. To use the Google Cloud Console, go to the Service Accounts page and choose a Google Cloud project to create it in. The Google Cloud project in which you create these service accounts does not need to match the project that you use for Databricks workspace, nor do the new service accounts need to use the same Google Cloud project as each other.

    1. Token-creating service account: Create a new service account that will create API tokens. Google documentation calls this SA-1.

    2. Main service account for Databricks APIs: Create a new service account, which will be your main service account. You’ll use this to authenticate to Databricks APIs. Google documentation calls this SA-2.

    Save the email address for both service accounts for use in later steps.

  2. Create a service account key for your token-creating service account (SA-1) and save it to a local file called SA-1-key.json.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-1.

    2. Click the KEYS tab.

    3. Click ADD KEY.

    4. Ensure that JSON (the default) is selected.

    5. Click CREATE.

    6. The web page downloads a key file to your browser. Move that file to your local working directory and rename it SA-1-key.json.

    For additional instructions, see the Google article Creating service account keys.

  3. Grant your token-creating service account (SA-1) the Service Account Token Creator Role on your main service account (SA-2). Follow the instructions in the Google article Direct request permissions.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-2.

    Important

    In Google Cloud Console, be sure to edit your main SA (SA-2), not your token-creating SA (SA-1):

    1. Click PERMISSIONS.

    2. Click GRANT ACCESS.

    3. In the New Principals field, paste the email address for your token-creating SA (SA-1).

    4. In the Role field, choose Service Account Token Creator Role or any role that is a superset of this role.

    5. Click SAVE.

Step 2: Create a JWT token for your token-creating service account (SA-1)

You must now use the key JSON file that you created in the previous step to create a JWT token that represents your token-creating service account (SA-1).

These instructions use a Python program on your local system to generate the JWT token. Databricks recommends using Python 3. This example requires the pip tool.

  1. If you do not already have PyJWT installed, run the following command:

    python -m pip install PyJWT
    
  2. Copy the following Python code to your local working directory as a file named python create-jwt.py.

    import jwt
    import time
    
    import json
    
    # CONFIGURATION
    
    # Your service account SA-1 email address
    my_SA = '<SA-1-email-address>'
    
    # Full path to your JSON if it is not 'SA-1-key.json' in current directory
    my_key_json_path = 'SA-1-key.json'
    
    # Duration in seconds for this JWT before expiry.
    # Because we use this to call a Google API, the limit is one hour (3600 seconds).
    duration_seconds = 3600
    
    
    
    # IMPLEMENTATION
    
    sa_secret = json.load(open(my_key_json_path))
    
    iat = time.time()
    exp = iat + duration_seconds
    payload = {
        'iss': my_SA,
        'sub': my_SA,
        'aud': 'https://oauth2.googleapis.com/token',
        'iat': iat,
        'exp': exp,
        'scope': 'https://www.googleapis.com/auth/cloud-platform'
    }
    additional_headers = {'kid': sa_secret['private_key_id']}
    signed_jwt = jwt.encode(payload, sa_secret['private_key'], headers=additional_headers,
                          algorithm='RS256')
    
    print("") # add blank line to separate any warnings or other output from main output
    print(signed_jwt)
    
  3. Modify the code for your configuration:

    • Replace <SA-1-email-address> with your token-creating service account email address.

    • If your SA-1 key JSON file is not named SA-1-key.json in the current directory, change the my_key_json_path assignment to the full path to SA-1-key.json.

  4. Run the program:

    python create-jwt.py
    
  5. Save the long string in the output to a file in your working directory named sa-1-jwt.txt.

Step 3: Use the JWT token to create a Google Cloud access token for your token-creating service account (SA-1)

Exchange the JWT token that you created for a new Google Cloud access token by running the following curl command. Replace <jwt-token> with your JWT token from the previous step, which you saved in file sa-1-jwt.txt.

echo; curl --location --request POST 'https://oauth2.googleapis.com/token' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'grant_type=urn:ietf:params:oauth:grant-type:jwt-bearer' \
--data-urlencode 'assertion=<jwt-token>'

This generates a result that looks like:

{"access_token":"<access-token-sa-1>","expires_in":3599,"token_type":"Bearer"}

Save the contents of the access_token field (not the entire JSON) without the quote signs and store as a file access-token-sa-1.txt.

This is the access token for your token-creating service account (SA-1). You use that access token in later steps to generate your tokens for your main service account (SA-2).

Step 4: Create an OIDC token for your main service account (SA-2)

Use the access token for SA-1 to generate an OIDC token for your main service account (SA-2).

Run the following curl command and make the following changes:

  • Replace <SA-2-email-address> with the SA-2 email address.

  • Replace <SA-1-access-token> with the SA-1 access token from your file access-token-sa-1.txt.

  • Replace <audience> as follows base on what APIs you intend to call:

    • To use the OIDC token with Databricks workspace APIs, use the full HTTPS URL for your Databricks workspace, not including any subpaths. For example https://999999987652360.0.gcp.databricks.com.

    • To use the OIDC token with the Databricks Account API, use the value https://accounts.gcp.databricks.com.

    Important

    Because of the difference in the audience field for different use cases, you cannot use the same OIDC token for both workspace APIs and the Account API. To use OIDC for both types of APIs, create two different OIDC tokens.

  • Set the includeEmail parameter to true.

echo; curl --location --request POST 'https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/<SA-2-email-address>:generateIdToken' \
--header 'Authorization: Bearer <SA-1-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{
 "delegates": [],
 "audience": "<audience>",
 "includeEmail": "true"
}'

The result looks like:

{
  "token": "<oidc-token-sa-2>"
}

Save the contents of the token field (not the entire JSON) without the quote signs to a file named oidc-token-sa-2.txt.

OIDC tokens by default have a one hour expiry.

Important

You must finish all remaining steps within that timeframe. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google OIDC token.

Step 5: (For Account API only) Create a Google Cloud access token for your main service account (SA-2)

Note

This step is required only to call the Account API 2.0. To call workspace APIs, skip this step.

The request to generate an access token includes a lifetime field that defines how long the access token is valid. If you only need the token to be active for five minutes, set to 300s (300 seconds). The following example uses 3600s, which represents one hour.

Important

  • You must finish all remaining steps within that timeframe. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google access token.

  • By default, an hour (3600s) is the maximum duration you can set for the lifetime field. To extend this limit, contact Google customer support and request an exception.

  1. Run the following curl command. Replace <SA-2-email-address> with the service account email address for SA-2. Replace <SA-1-access-token> with the access token for SA-1.

    echo; curl --location --request POST \
    'https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/<SA-2-email-address>:generateAccessToken' \
    --header 'Authorization: Bearer <SA-1-access-token>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
    "scope":["https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/compute"],
    "lifetime": "3600s"
    }'
    

    The output looks like:

    {
      "accessToken": "<access-token-sa-2>",
      "expireTime": "2022-02-24T20:55:16Z"
    }
    
  2. Save the contents of the accessToken field (not the entire JSON) to a file called access-token-sa-2.txt.

Step 6: Add your main service account (SA-2) to the account or workspace

You can use OIDC tokens to call Databricks account-level APIs like the Account API or workspace-level APIs. The instructions are different based on the use case. Note that you cannot use one OIDC token to access both types of APIs because of the difference in the audience field when creating the OIDC token.

Allow SA-2 to call the Account API

To call Account APIs with the OIDC token, use the account console to add your main service account (SA-2) as an account admin just as if it were a user:

  1. As an account owner or account admin, go to the Users tab in the account console.

  2. Click Add User.

  3. In the Email address field, enter the main service account (SA-2) email address.

  4. Enter some data in the required first name and last name fields that reflect the purpose of this service account.

  5. Click Send invite. Because you used a service account and not a real user email, there is no actual invitation email. The service account is authorized as an account admin immediately without the need for additional confirmation.

Allow SA-2 to call workspace APIs

To call workspace APIs using the OIDC token, add your main service account (SA-2) to the workspace just as if it were a user:

  1. Follow the instructions in Add a user and use your main service account’s email address when prompted to provide it in the admin console.

  2. As needed, add any group memberships that might be required for your new service account based on which Databricks REST APIs you plan to call and the data objects that you want to use. See Manage groups.

  3. As needed, add any Databricks access control settings for that user that may be required. See Enable access control.

Step 7: Call a Databricks API

The tokens you need to provide during REST API authentication varies on your planned usage: either Account API or Workspace-level APIs. Note that you cannot use one OIDC token to access both types of APIs because of the difference in the audience field when creating the OIDC token.

The following HTTP headers are used for Databricks authentication.

HTTP header name

Description

Authorization

The service account OIDC token as a bearer token (Authentication: Bearer <token>). Databricks authenticates the request based on the identity in the OIDC token.

X-Databricks-GCP-SA-Access-Token

The Google Cloud access token for SA-2. Databricks needs this token to perform validations and manage IAM roles.

Individual APIs may require different combinations of HTTP headers.

Use case

Add the Authorization header

Add the X-Databricks-GCP-SA-Access-Token header

Account API

Yes

Yes

Workspace-level APIs

Yes

No

Account API example

The following example calls the Account API to get a list of workspaces. Replace <oidc-token> with the OIDC token you saved in file oidc-token-sa-2.txt. Replace <access-token-sa-2> with the SA-2 access token that you saved in file access-token-sa-2.txt.

echo; curl \
  -X GET \
  --header 'Authorization: Bearer <oidc-token>' \
  --header 'X-Databricks-GCP-SA-Access-Token: <access-token-sa-2>' \
  https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces

Workspace-level API example

The following example calls the workspace-level API to list clusters. Replace <oidc-token> with the OIDC token you saved in file oidc-token-sa-2.txt.

echo; curl \
  -X GET \
  --header 'Authorization: Bearer <oidc-token>' \
  https://1234567890123456.7.gcp.databricks.com/api/2.0/clusters/list