Be sure that you understand all configuration settings before you create a new workspace. You cannot modify a workspace configuration after you create the workspace.
Be sure you have enough Google Cloud resource quotas needed for the workspace. Request a quota increase if you need to.
To create a workspace:
Choose a network type for your new workspace:
Databricks-managed VPC (default): Databricks creates and manages the lifecycle of the VPC. If you choose this network type, there are no additional steps to perform now.
Customer-managed VPC: Create and specify your own customer-managed VPC for your new Databricks workspace to use. If you choose this network type, perform the following steps now:
As a Databricks account admin, log in to the account console and click the Workspaces icon.
Click Create Workspace.
In the Workspace Name field, enter a human-readable name for this workspace. Only alphanumeric characters, underscores, and hyphens are allowed, and the name must be 3-30 characters long.
In the Region field, select a region for your workspace’s network and clusters. For the list of supported regions, see Databricks clouds and regions.
In the Google cloud project ID field, enter your Google Cloud project ID. To learn how to get your project ID, see Requirements.
If you plan to use a customer-managed VPC for this workspace:
Network setup. This step varies based on the workspace’s network type. For a customer-managed VPC, click the Customer-managed VPC tab.
Optionally specify custom subnet sizes. If you leave these fields blank, Databricks uses defaults.
Configure the GKE subnets used by your Databricks workspace accurately. You cannot change them after your workspace is deployed. If the address ranges for your Databricks subnets are too small, then the workspace exhausts its IP space, which in turn causes your Databricks jobs to fail. To determine the address range sizes that you need, Databricks provides a subnet calculator as a Microsoft Excel spreadsheet.
Click Advanced configurations to specify custom IP ranges in CIDR format. The IP ranges for these fields must not overlap. All IP addresses must be entirely within the following ranges:
The sizes of these IP ranges affect the maximum number of nodes for the workspace.
In the Subnet CIDR field, type the IP range in CIDR format to use for the subnet. Nodes of the GKE cluster come from this IP range. This is also the IP range of the subnet where the GKE cluster lives. Range must be no bigger than
/9and no smaller than
In the Pod address range field, type the IP range in CIDR format to use as the secondary IP range for GKE pods. Range must be no bigger than
/9and no smaller than
In the Service address range field, type the IP range in CIDR format to use as the secondary IP range for GKE services. Range must be no bigger than
/16and no smaller than
Specify a network configuration that represents your VPC and its subnets:
Network Mode: Set this to Customer-managed network.
Network configuration: Select your network configuration’s name.
(Optional) Configure details about private GKE clusters.
By default, Databricks creates a private GKE cluster instead of a public GKE cluster. A private cluster’s GKE nodes have no public IP that is routable in the public internet. This option requires that Databricks create an additional Google Cloud cloud NAT. For a private cluster, you can optionally set a custom value for the IP range for GKE master resources. Click Advanced configurations then set the IP range for GKE master resources field. All IP addresses must be entirely within the following ranges:
240.0.0.0/4. The range must have the size
To instead use a public GKE cluster, click Advanced configurations and deselect Enable private cluster.
(Optional) You can secure a workspace with private connectivity and mitigate data exfiltration risks by enabling Google Private Service Connect (PSC) on the workspace. To configure this, click Advanced configurations and choose a private access settings object. Before adding PSC configuration, Databricks strongly recommends reading the article Enable Private Service Connect for your workspace for requirements and context.
(Optional) You can add customer-managed keys for two different use cases:
Workspace storage (the two workspace GCS buckets, as well as GCE Persistent Disk volumes of a cluster or SQL warehouse). For important details, see Customer-managed keys for workspace storage.
To configure this during workspace creation, you can use the two pickers to select an already-created encryption key configuration for each use case. You can choose the same configuration if it supports both use cases. For detailed instructions using the account console, see Configure customer-managed keys using the account console.
Alternatively, you can also create a key configuration in this workspace creation flow by clicking a picker for a use case and click Add new encryption key configuration.
If this is the first time that you have created a workspace, a Google popup window asks you to select your Google account. Complete the following instructions.
If you do not see the Google account popup:
If the page does not change, you may have a popup blocker in your web browser. Look for a notification about blocking a popup window. Configure your popup blocker to allow popup windows from domain
If you do not see the Google dialog but your browser now shows a list of workspaces, continue to the next step.
In the Google dialog, select the Google account with which you signed into the account console.
On the next screen, reply to the consent request that asks you for additional scopes. Click Allow.
The consent screen is shown the first time you attempt to create a workspace. For successive new workspaces, Google does not show the consent screen. If you use Google account tools to revoke the consent granted to Databricks, Google displays the consent screen again.
Confirm that your workspace was created successfully. Next to your workspace in the list of workspaces, click Open. To view workspace status and test the workspace, see View workspace status.
Secure the workspace’s GCS buckets. See Secure the workspace’s GCS buckets in your project.
When you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage (GCS) buckets in your Google Cloud project. Databricks strongly recommends that you secure these GCS buckets so that they are not accessible from outside Databricks on Google Cloud.
During workspace creation, Databricks enables some required Google APIs on the project, if they are not already enabled. See Enabling Google APIs on a workspace’s project.
During workspace creation, Databricks automatically enables the following required Google APIs on the Google Cloud project if they are not already enabled:
These APIs are not disabled automatically during workspace deletion.
You can create at most 200 workspaces per week in the same Google Cloud project. If you exceed this limit, creating a workspace fails with the error message: “Creating custom cloud IAM role <your-role> in project <your-project> rejected.”
After you create a workspace, you can view its status on the Workspaces page.
Provisioning: In progress. Wait a few minutes and refresh the page.
Running: Successful workspace deployment.
Failed: Failed deployment.
Banned: Contact your Databricks representative.
Cancelling: In the process of cancellation.
If the status for your new workspace is Failed, click the workspace to view a detailed error message. If you do not understand the error, contact your Databricks representative.
You cannot update the configuration of a failed workspace. You must delete it and create a new workspace.
Go to the account console and click the Workspaces icon.
On the row with your workspace, click Open.
To log in as a workspace administrator, log in with your account owner or account administrator email address and password.
When you create a workspace, Databricks on Google Cloud creates two Google Cloud Storage GCS buckets in your GCP project:
One GCS bucket stores system data that is generated as you use various Databricks features such as creating notebooks. This bucket includes notebook revisions, job run details, command results, and Spark logs.
Another GCS bucket store is your workspace’s root storage for the Databricks File System (DBFS). Your DBFS root bucket is not intended for storage of production customer data. Create other data sources and storage for production customer data in additional GCS buckets. You can optionally mount the additional GCS buckets as the Databricks File System (DBFS) mounts. See Google Cloud Storage.
Databricks strongly recommends that you secure these GCS buckets so that they are not accessible from outside Databricks on Google Cloud.
To secure these GCS buckets:
In a browser, go to the GCP Cloud Console.
Select the Google Cloud project that hosts your Databricks workspace.
Go to that project’s Storage Service page.
Look for the buckets for your new workspace. Their names are:
For each bucket:
Click on the bucket to view details.
Click the Permissions tab.
Review all the entries of the Members list and determine if access is expected for each member.
Check the IAM Condition column. Some permissions, such as those named “Databricks service account for workspace”, have IAM Conditions that restrict them to certain buckets. The Google Cloud console UI does not evaluate the condition, so it may show roles that would not actually be able to access the bucket.
Pay special attention to roles without any IAM Condition. Consider adding restrictions on these:
When adding Storage permissions at the project level or above, use IAM Conditions to exclude Databricks buckets or to only allow specific buckets.
Choose the minimal set of permissions needed. For example, if only read access is needed, specify Storage Viewer instead of Storage Admin.
Do not use Basic Roles because they are too broad.
Enable Google Cloud Data Access audit logging. Databricks strongly recommends that you enable Data Access audit logging for the GCS buckets that Databricks creates. This enables faster investigation of any issues that may come up. Be aware that Data Access audit logging can increase GCP usage costs. For instructions, see Configuring Data Access audit logs.
If you have questions about securing these GCS buckets, contact your Databricks representative.