Create and manage workspaces using the account console

A workspace is a Databricks deployment in a cloud service account. It provides a unified environment for working with Databricks assets for a specified set of users. This article describes how to create and manage workspaces.

Note

Databricks charges for Databricks usage in Databricks Units (DBUs). The number of DBUs a workload consumes varies based on a number of factors, including Databricks compute type (all-purpose or jobs) and Google Cloud machine type. For details, see the pricing page. If you have questions about pricing, contact your Databricks representative.

Additional costs are incurred in your Google Cloud account:

  • Google Cloud charges you an additional per-workspace cost for the GKE cluster that Databricks creates for Databricks infrastructure in your account. As of March 30, 2021, the cost for this GKE cluster is approximately $200/month, prorated to the days in the month that the GKE cluster runs. Prices can change, so check the latest prices.
  • The GKE cluster cost applies even if Databricks clusters are idle. To reduce this idle-time cost, Databricks deletes the GKE cluster in your account if no Databricks Runtime clusters are active for 72 hours. Other resources, such as VPC and GCS buckets, remain unchanged. The next time a Databricks Runtime cluster starts, Databricks recreates the GKE cluster, which adds to the initial Databricks Runtime cluster launch time. For an example of how GKE cluster deletion reduces monthly costs, let’s say you used a Databricks Runtime cluster on the first of the month but not again for the rest of the month: your GKE usage would be the three days before the idle timeout takes effect and nothing more, costing approximately $20 for the month.

Databricks does not support configuration changes to a running GKE cluster. If you customize a GKE cluster configuration after it is created, and that cluster is deleted due to idle timeout, the recreated cluster will not include your customizations.

Create a workspace

Be sure that you understand all configuration settings before you create a new workspace. You cannot modify a workspace configuration after attempting to create the workspace.

To create a workspace:

  1. As a Databricks account owner or account admin, log in to the account console and click the Workspaces icon. This is the account console default view.

  2. Click Create Workspace.

    Create workspace
  3. In the Workspace Name field, enter a human-readable name for this workspace. Only alphanumeric characters, underscores, and hyphens are allowed, and the name must be 3-30 characters long.

  4. In the Region field, select a region for your workspace’s network and clusters. For the list of supported regions, see Supported Databricks regions.

  5. In the Google cloud project ID field, enter your Google project ID.

    To learn how to get your project ID, see Prerequisites for account and workspace creation.

  6. (Optional) Specify custom IP ranges.

    Important

    Configure the GKE subnets used by your Databricks workspace accurately because you cannot change them after your workspace is deployed. If the address ranges for your Databricks subnets are too small, then the workspace exhaust its IP space, which in turn causes your Databricks jobs to fail. To determine the address range sizes that you need, Databricks provides a calculator in the form of a Microsoft Excel spreadsheet. See Calculate subnet sizes for a new workspace.

    Click Advanced configurations to specify custom IP ranges in CIDR format. The IP ranges for these fields must not overlap, and all IP addresses must be entirely within the following ranges:

    • 10.0.0.0/8
    • 100.64.0.0/10
    • 172.16.0.0/12
    • 192.168.0.0/16
    • 240.0.0.0/4

    The sizes of these IP ranges affect the maximum number of nodes for the workspace.

    • In the Subnet CIDR field, type the IP range in CIDR format to use for the subnet. Nodes of the GKE cluster come from this IP range. This is also the IP range of the subnet where the GKE cluster lives. Range must be no bigger than /9 and no smaller than /29.
    • In the Pod address range field, type the IP range in CIDR format to use as the secondary IP range for GKE pods. Range must be no bigger than /9 and no smaller than /21.
    • In the Service address range field, type the IP range in CIDR format to use as the secondary IP range for GKE services. Range must be no bigger than /16 and no smaller than /27.
  7. (Optional) Configure details about private GKE clusters.

    • By default, Databricks creates a private GKE cluster instead of a public GKE cluster. A private cluster’s GKE nodes have no public IP that is routable in the public internet. This option requires that Databricks create an additional Google Cloud cloud NAT. For a private cluster, you can optionally set a custom value for the IP range for GKE master resources. Click Advanced configurations then set the IP range for GKE master resources field. IP range limits are the same as for the other IP range fields in this form. Range must be size /28.
    • To instead use a public GKE cluster, click Advanced configurations and deselect Enable private cluster.
  8. Click Save.

    If this is the first time you have created a workspace, a Google dialog appears that asks you to select your Google account. Follow the instructions below. If you do not see the Google dialog, continue to the next step.

    1. In the Google dialog, select the Google account with which you signed into the account console.

    2. On the next screen, reply to the consent request that asks you for additional scopes:

      Incremental authorization

      Click Allow.

      The consent screen is shown the first time you attempt to create a workspace. For successive new workspaces, Google does not show the consent screen. If you use Google account tools to revoke the consent granted to Databricks, Google displays the consent screen again.

  9. Confirm that your workspace was created successfully. See View workspace status and test the new workspace.

  10. Secure the workspace’s GCS buckets. See Secure the workspace’s GCS buckets in your project.

    Warning

    When you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage (GCS) buckets in your GCP project. Databricks strongly recommends that you secure these GCS buckets so that they are not accessible from outside Databricks on Google Cloud.

Workspace creation limits

You can create at most 200 workspaces per week in the same Google Cloud project. If you exceed this limit, creating a workspace fails with the error message: “Creating custom cloud IAM role <your-role> in project <your-project> rejected.”

View workspace status and test the new workspace

After you create a workspace (or update a failed workspace configuration), you can view it on the Workspaces page. To check the workspace creation status:

  1. View the Status column for your new workspace:
    • Provisioning: In progress. Wait a few minutes and refresh the page.
    • Running: Successful workspace deployment. Continue to the next step in this procedure.
    • Failed: Failed deployment.
    • Banned: Contact your Databricks representative.
    • Cancelling: In the process of cancellation.
  2. When your new workspace is Running, test your workspace:
    1. From the Actions menu in the Workspace row, select Visit Workspace.
    2. Log in with your account owner or account admin email address and password.

If the status for your new workspace is Failed, click the workspace to view a detailed error message. If you do not understand the error, contact your Databricks representative.

You cannot update the configuration of a failed workspace. You must delete it and try to create a new workspace.

Log into a workspace

  1. As the user who created the workspace, log into the account console and click the Workspaces icon.
  2. On the row that displays your workspace, click Actions, then Visit Workspace. Alternatively, click the workspace name, then click the link under the URL label.
  3. Log in with your account owner or account admin email address and password. If you configured single-sign on (SSO), click the Single Sign On tab, and then click the large blue Single Sign On button.

Secure the workspace’s GCS buckets in your project

When you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage GCS buckets in your GCP project:

  • One GCS bucket stores system data that is generated as you use various Databricks features such as creating notebooks. This bucket includes notebook revisions, job run details, command results, and Spark logs.
  • Another GCS bucket store is your workspace’s root storage for the Databricks File System (DBFS). Your DBFS root bucket is not intended for storage of production customer data. Create other data sources and storage for production customer data in additional GCS buckets. You can optionally mount the additional GCS buckets as the Databricks File System (DBFS) mounts. See Google Cloud Storage.

Databricks strongly recommends that you secure these GCS buckets so that they are not accessible from outside Databricks on Google Cloud.

To secure these GCS buckets:

  1. In a browser, go to the GCP Cloud Console.

  2. Select the Google Cloud project that hosts your Databricks workspace.

  3. Go to that project’s Storage Service page.

  4. Look for the buckets for your new workspace. Their names are:

    • databricks-<workspace id>
    • databricks-<workspace id>-system
  5. For each bucket:

    1. Click on the bucket to view details.

    2. Click the Permissions tab.

    3. Review all the entries of the Members list and determine if access is expected for each member.

    4. Check the IAM Condition column. Some permissions, such as those named “Databricks service account for workspace”, have IAM Conditions that restrict them to certain buckets. The Google Cloud console UI does not evaluate the condition, so it may show roles that would not actually be able to access the bucket.

      Pay special attention to roles without any IAM Condition. Consider adding restrictions on these:

      • When adding Storage permissions at the project level or above, use IAM Conditions to exclude Databricks buckets or to only allow specific buckets.

      • Choose the minimal set of permissions needed. For example, if only read access is needed, specify Storage Viewer instead of Storage Admin.

        Warning

        Do not use Basic Roles because they are too broad.

    5. Enable Google Cloud Data Access audit logging. Databricks strongly recommends that you enable Data Access audit logging for the GCS buckets that Databricks creates. This enables faster investigation of any issues that may come up. Be aware that Data Access audit logging can increase GCP usage costs. For instructions, see Configuring Data Access audit logs.

If you have questions about securing these GCS buckets, contact your Databricks representative.

Delete a workspace

  1. Go to the account console and click the Workspaces icon.

  2. On the row with your workspace, click Actions, then Delete. Alternatively, click the workspace name, click the Configure button, and select Delete Workspace.

  3. In the confirmation dialog, type the workspace name and click Confirm Delete.

  4. If you delete a workspace, the two GCS buckets that Databricks created may not be deleted automatically if they are not empty. For example, there might be files that you added directly or indirectly, like libraries or other files, in the bucket that contains your workspace’s DBFS root.

    After workspace deletion, you can find and delete remaining objects manually in the Google Cloud Console for your project. Go to the following page, replacing <project-id> with your project ID: https://console.cloud.google.com/dm/deployments?project=<project-id>.