Create a workspace using the Account API

You can create workspaces using the Account API. The Account API lets you programmatically create multiple new Databricks workspaces associated with a single Databricks account. Each workspace you create can have different configuration settings. Alternatively, you can create a workspace using the account console or Terraform.

By default, Databricks creates and manages the lifecycle of the workspace’s VPC. Optionally, you can specify your own customer-managed VPC. This feature requires the Premium pricing tier.

Create a workspace with the default VPC using the Account API

This topic describes how to use the Account API to create a workspace that has a Databricks-managed VPC. To create a workspace that uses a customer-managed VPC, instead follow the instructions in Create a workspace with a customer-managed VPC using the Account API.

You can use the Account API to create a workspace. The Account API is an account-level API, which means that authentication is different from most Databricks REST APIs, which are workspace-level APIs. For authentication to account-level APIs, you must use Google ID authentication and create two different types of tokens (a Google ID token and a Google access token) that you include as HTTP headers on each Account API request. See Authentication with Google ID tokens.

Related actions:

To create a workspace with the default VPC using the Account API:

  1. Ensure that the service account that you are using has the correct permissions for workspace creation. This is your main service account, called SA-2, as described in Authentication with Google ID tokens. The service account needs one of following roles or groups of roles on the Google Cloud Project in which the workspace is created:

    • Owner (roles/owner)

    • Both Editor (roles/editor) and Project IAM Admin (roles/resourcemanager.projectIamAdmin).

    1. Go to the project IAM page in Google Cloud console.

    2. If needed, change the project from the project picker at the top of the page to match your workspace’s project.

    3. If the service account already has roles on this project, you can find it on this page and review its roles in the Role column.

    4. To add new roles to the service account on this project:

      1. At the top of the IAM page, click ADD.

      2. In the Principal field, type the email address of the service account.

      3. Click the Select a role field. Choose a required role. For the roles Owner, Viewer, and Editor, you can find them within the picker in the the Basic category.

      4. To add other roles, click ADD ANOTHER ROLE and repeat the previous steps in “To add roles”.

      5. Click SAVE.

  2. If you have not already done it or if your Google ID or access tokens expired, create both types of tokens for Google ID authentication to the Account API.

  3. Calculate the GKE subnets used by your Databricks workspace. You cannot change them after your workspace is deployed. If the address ranges for your Databricks subnets are too small, then the workspace exhausts its IP space, which causes your Databricks jobs to fail. To determine the address range sizes that you need, use the Databricks-provided calculator.

  4. Create a default workspace using the following command.

    curl --location --request POST 'https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces' \
    --header 'X-Databricks-GCP-SA-Access-Token: <google-access-token>' \
    --header 'Authorization: Bearer <google-id-token>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
       "workspace_name": "<workspace-name>",
       "cloud": "gcp",
       "location": "<region>",
       "cloud_resource_container": {
           "gcp": {
              "project_id": "<workspace-resource-project-id>"
           }
       },
    }
    '
    

    Replace:

    • <google-id-token> and <google-access-token> with your Google ID and Google access tokens.

    • <account-id> with your account ID.

    • <workspace-name> with a human-readable name for your new workspace.

    • <region> with the name of a supported region.

    • <workspace-resource-project-id> with the Google Cloud project that you want to use.

    Set optional parameters:

    • (Optional) To override GKE parameter defaults, add a gke_config object in the request. For example, switch to public GKE cluster or change the IP range for GKE cluster master resources. See Create a new workspace.

    • (Optional) To override managed network IP ranges defaults, add a gcp_managed_network_config object in the request. For example, change the IP ranges for cluster pods, cluster service, or the IP range in CIDR format to use for the subnet. See Create a new workspace.

      Note

      The IP ranges for pods, services, and master IP range must be mutually exclusive. The IP ranges for these fields must not overlap, and all IP addresses must be entirely within the following ranges: 10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16, and 240.0.0.0/4.

    • (Optional) You can add customer-managed encryption keys to help control access to some types of data. See Customer-managed keys for encryption. To configure keys with the workspace, you need to have created an encryption key configuration object so you can reference it by ID in the parameters storage_customer_managed_key_id (for workspace storage) or managed_services_customer_managed_key_id (for managed services). See Configure customer-managed keys for encryption requirements and context.

  5. Confirm that your workspace was created successfully. Next to your workspace in the list of workspaces, click Open. To view workspace status and test the workspace, see View workspace status.

  6. Secure the workspace’s GCS buckets. See Secure the workspace’s GCS buckets in your project.

    When you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage (GCS) buckets in your Google Cloud project. Databricks strongly recommends that you secure these GCS buckets so that they are not accessible from outside Databricks on Google Cloud

During workspace creation, Databricks enables some required Google APIs on the project if they are not already enabled. See Enabling Google APIs on a workspace’s project.

Create a workspace with a customer-managed VPC using the Account API

Before you create a workspace with a customer-managed VPC, you must create a Databricks object called a network configuration, which represents the Google Cloud VPC that you plan to use, as well as related objects like subnets. You specify the network configuration when you create the Databricks workspace. You cannot move an existing workspace with a Databricks-managed VPC to your own VPC. Also, after workspace creation you cannot change which customer-managed VPC that the workspace uses.

You can also perform the tasks described in this article using the account console. However, to configure a customer-managed VPC, the Google Cloud principal that needs specific roles on Google Cloud projects depends on how you perform the operation. To use the account console, the principal is your admin user account. To use the Account API, the principal is the main service account (SA-2) that you will use for Google ID authentication.

You can use the Account API to add a network configuration and also to create a workspace. The Account API is an account-level API, which means that authentication is different from most Databricks REST APIs, which are workspace-level APIs. For authentication to account-level APIs, you must use Google ID authentication and create two different types of tokens (Google ID token and a Google access token) that you include as HTTP headers on each Account API request. For details, see Authentication with Google ID tokens.

Set up your VPC

Perform the following steps that are described in the article Configure a customer-managed VPC:

  1. Review all customer-managed VPC requirements.

  2. Create your VPC.

Do not perform other steps in that article.

Add roles to your service account

The principal that performs an operation must have specific required roles for each operation. The principal that needs specific roles on the project depends on how you perform the operation.

A service account does not automatically inherit roles from you as its creator. You must add roles for the service account on the project.

Using the article Configure a customer-managed VPC, perform these steps:

  1. Review the roles that are required on projects to create a workspace and other related operations.

  2. Follow the instructions to add specific roles on projects but with one modification for Account API usage: do not specify your admin user account email address as the principal. Instead, specify the principal as the email address for the main service account (SA-2) that you will use for Google ID authentication.

Register a network configuration

You can use the Account API to add a network configuration. For a full API reference or to download the OpenAPI specification, see Account API.

Important

Both types of authentication tokens (Google ID token and Google access tokens) expire in one hour. Consider initially reading the Google ID documentation but wait to create your authentication tokens until you are ready to call Account API.

  1. Enable the Cloud Resource Manager API on your service account’s project

    1. Go to the Cloud Resource Manager API.

    2. If needed, use the project picker at the top of the page to change the project to the Google Cloud project where the service account you will use was created. In the Google ID examples, this main service account is also called SA-2.

    3. If you see the Enable button, click Enable. Wait 1 minute before proceeding.

      If the Enable button is not visible, the API is already enabled.

  2. If you have not already done it, or if your Google ID or access tokens have expired, create both types of tokens that are required for Google ID authentication.

  3. Create network configuration using REST API using the following command.

    curl --location --request POST 'https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/networks' \
    --header 'X-Databricks-GCP-SA-Access-Token: <google-access-token>' \
    --header 'Authorization: Bearer <google-id-token>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
     "network_name": "<network-configuration-name>",
     "gcp_network_info": {
       "network_project_id": "<vpc-host-project-id>",
       "vpc_id": "<vpc-id>",
       "subnet_id": "<subnet-id>",
       "subnet_region": "<subnet-region>",
       "pod_ip_range_name": "<name-of-pod-secondary-range>",
       "service_ip_range_name": "<name-of-svc-secondary-range>"
     }
    }'
    
    • Replace <google-id-token> and <google-access-token> with your Google ID and Google access tokens.

    • Replace <account-id> with your account ID.

    • Replace <network-configuration-name> with a new human-readable network configuration name.

    • Replace <vpc-host-project-id> with your VPC’s project ID.

      Important

      If you use a Google Cloud Shared VPC, which allows a different Google Cloud project for your workspace resources such as compute resources and storage, set this to the project ID for your VPC, not the project ID for your workspace resources.

    • Set the <vpc-id>, <subnet-id>, and <subnet-region> fields to the VPC ID, subnet ID, and subnet region. The subnet region must match the region that you want to use with your new workspace.

    • For <name-of-pod-secondary-range> and <name-of-svc-secondary-range>, replace with the pod secondary range and service secondary range that you created in earlier steps. If you used the earlier example to create the standalone VPC with the gcloud CLI command, these secondary IP ranges are named pod and svc.

      The IP ranges for pods, services, and master IP range must be mutually exclusive. The IP ranges for these fields must not overlap, and all IP addresses must be entirely within the following ranges: 10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16, and 240.0.0.0/4.

    This returns a JSON-formatted network configuration object:

    {
      "account_id": "e11e38c5-a449-47b9-b37f-0fa36c821612",
      "creation_time": 1644388480866,
      "gcp_network_info": {
        "network_project_id": "<vpc-host-project-id>",
        "pod_ip_range_name": "<name-of-pod-secondary-range>",
        "service_ip_range_name": "<name-of-svc-secondary-range>",
        "subnet_id": "<subnet-id>",
        "subnet_region": "<subnet-region>",
        "vpc_id": "<vpc-id>"
      },
      "network_id": "<network-configuration-id",
      "network_name": "<network-configuration-name>",
      "vpc_status": "UNATTACHED"
    }
    
  4. Save the network_id field in the result. This is the ID for your network configuration object. You will need it to create the workspace.

Create a workspace with a customer-managed VPC

Perform the following steps to use the Account API to create a workspace with a customer-managed VPC. For a full API reference or to download the OpenAPI specification, see Account API. To create a workspace with a Databricks-managed VPC, instead see Create a workspace with the default VPC using the Account API.

Important

Both types of authentication tokens (Google ID token and Google access tokens) expire in one hour. Consider initially reading the Google ID documentation but wait to create your authentication tokens until you are ready to call Account API.

  1. If you have not already done it, enable the Cloud Resource Manager API on your service account’s project. If you have done this already, skip to the next step in this section.

    1. Go to the Cloud Resource Manager API.

    2. If needed, use the project picker at the top of the page to change the project to the Google Cloud project where the service account you will use was created. In the Google ID examples, this main service account is also called SA-2.

    3. If you see the Enable button, click Enable. Wait 1 minute before proceeding.

  2. Ensure that the service account that you are using has the correct permissions for workspace creation. This is your main service account, called SA-2, as described in Authentication with Google ID tokens. See Role requirements.

    Important

    If you use a Google Cloud Shared VPC, which allows a different Google Cloud project for your workspace resources such as compute resources and storage, note that you need specific roles on both projects.

  3. If you have not already done it, or if your Google ID or access tokens have expired, create both tokens that you need for Google ID authentication to this API.

  4. Run the following command to create a typical workspace with private GKE cluster:

    curl --location --request POST 'https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces' \
    --header 'X-Databricks-GCP-SA-Access-Token: <google-access-token>' \
    --header 'Authorization: Bearer <google-id-token>' \
    --header 'Content-Type: application/json' \
    --data-raw '{
       "workspace_name": "<workspace-name>",
       "cloud": "gcp",
       "location": "<region>",
       "cloud_resource_container": {
           "gcp": {
               "project_id": "<workspace-resource-project-id>"
           }
       },
       "network_id": "<network-configuration-id>",
       "gke_config": {
           "connectivity_type": "PRIVATE_NODE_PUBLIC_MASTER",
           "master_ip_range": "10.103.0.0/28"
      }
    }
    '
    
    • Replace <google-id-token> and <google-access-token> with your Google ID and Google access tokens.

    • Replace <account-id> with your account ID.

    • Replace <workspace-name> with a human-readable name for your new workspace.

    • Replace <region> with the name of a supported region.

    • Replace <workspace-resource-project-id> with the Google Cloud project that you want to use.

      Important

      If you use a Google Cloud Shared VPC, which allows a different Google Cloud project for your workspace resources such as compute resources and storage, set the Google cloud project ID field to the project ID for workspace resources, not the project ID for your VPC.

    • Replace <network-configuration-id> with the ID of the network configuration object from the previous step where you registered it.

    • (Optional) To override GKE parameter defaults, change the gke_config object in the request. For example, switch to a public GKE cluster or change the IP range for GKE cluster master resources. See Create a new workspace.

      The IP ranges for pods, services, and master IP range must be mutually exclusive. The IP ranges for these fields must not overlap, and all IP addresses must be entirely within the following ranges: 10.0.0.0/8, 100.64.0.0/10, 172.16.0.0/12, 192.168.0.0/16, and 240.0.0.0/4.

    • (Optional) You can secure a workspace with private connectivity and mitigate data exfiltration risks by enabling Google Private Service Connect (PSC) on the workspace. To configure this, you need to have created a private access settings object and reference its ID in the private_access_settings_id parameter. Before adding PSC configuration, Databricks strongly recommends reading the article Enable Private Service Connect for your workspace for requirements and context.

    • (Optional) You can add customer-managed encryption keys to help control access to some types of data. See Customer-managed keys for encryption. To configure keys with the workspace, you need to have created an encryption key configuration object so you can reference it by ID in the parameters storage_customer_managed_key_id (for workspace storage) or managed_services_customer_managed_key_id (for managed services). See Configure customer-managed keys for encryption requirements and context.

  5. Confirm that your workspace was created successfully. Next to your workspace in the list of workspaces, click Open. To view workspace status and test the workspace, see View workspace status.

  6. Secure the workspace’s GCS buckets. See Secure the workspace’s GCS buckets in your project.

    When you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage (GCS) buckets in your Google Cloud project. Databricks strongly recommends that you secure these GCS buckets so that they are not accessible from outside Databricks on Google Cloud.

During workspace creation, Databricks enables some required Google APIs on the project if they are not already enabled. See Enabling Google APIs on a workspace’s project.