To get users up and running on your Databricks on Google Cloud account, you must:
- Set up your Databricks account in Google.
- Create Databricks workspaces.
- Add users and groups to your workspaces.
To set up a Databricks on Google Cloud account, you must:
Have a Google billing account defined in Google Cloud. To create a billing account, see the Google documentation article Create, modify, or close your Cloud Billing account.
Confirm that you have the following roles for Google Identity and Access Management (IAM):
- Billing Administrator (
roles/billing.admin) for the target Cloud Billing account or the Google Cloud organization where your project is located. If you don’t have this role, contact an Organization Administrator to request access.
- Viewer (
roles/viewer) for the project associated with the billing account you plan to use . If you are not a viewer, you can either contact the project owner to request access, or create a new project to give yourself the correct permissions. If you create a new project, you must enable billing and link the project to the desired Cloud Billing account.
To learn about the relationship between between Google Cloud organizations, projects, and billing, see the Google documentation on Cloud Billing access control. To learn more about roles and permissions across Google Cloud, see the documentation on Understanding roles.
- Billing Administrator (
Have a Google project to deploy your workspaces in. You will need the project ID when you create your Databricks workspace. This does not need to be the same Google project as the one associated with your billing account.
If you do not already have a Google Project into which you will deploy your workspaces, create one now:
- Confirm that you have a Google Cloud Identity organization object defined within your Google Cloud Console. To create an organization, see the Google documentation article Creating and managing organizations.
- Create the project. See the Google documentation article Creating and managing organizations. You must define the project’s parent organization. If you do not specify a project ID during project creation, a project ID will be generated automatically.
- Copy the Google Project ID, which you need for Databricks workspace creation.
If you have a project but do not know the ID, go to your Google Cloud Platform Console Manage Resources page. Find your project and copy the ID.
Enable the Google APIs on the projects that you will use for workspaces:
Ensure that the following APIs are enabled in your GCP project. To confirm the list of enabled APIs, run this command with the
gcloudcommand line tool. Replace
<customer-project>with the GCP project that you will use with Databricks.
gcloud services list --project <customer-project>
Ensure that the following services are listed in the response. To enable one of these APIs in your Google Cloud project, see the Google documentation page Enabling an API in your Google Cloud project. In the following list, click the links to go directly to the Google API page. Some APIs have similar names or IDs, so using these links reduces the chance of enabling the wrong API. Before clicking Enable, ensure that you have selected the correct Google project from the picker at the top of the page.
Google API ID Google API name and link
Cloud Storage API
Kubernetes Engine API
Cloud Deployment Manager V2 API
In preparation for workspace creation, confirm or raise quotas for the account owner’s Google Cloud project.
These minimum quotas are required for the target Google Cloud regions that the Databricks clusters for the workspace are intended to run under. You specify the region for a workspace during workspace creation. For the list of supported regions, see Supported Databricks regions.
See the following related Google articles:
- Google Compute Engine: Learn about resource quotas and requesting quota increases.
- Google Cloud Filestore: Learn about requesting quota increases
If you change any quotas, wait 15 minutes for the quotas to take effect before creating a workspace.
Google Cloud quota name Required minimum quota Recommended quotas for running at scale
GCP default is OK 300
GCP default is OK 6000
GCP default is OK 275
GCP default is OK 500
GCP default is OK 500
GCP default is OK 100
GCP default is OK 500
GCP default is OK 50 TB
7.5 TB 50 TB
In preparation for workspace creation, confirm the permissions that are required to create a workspace.
For each workspace, Databricks creates a service account with the minimal permissions needed to create and manage the workspace. Your Google OAuth identity will be used to grant permissions to the service account on your project. All you have to do is click OK on a standard OAuth dialog. The Databricks account admin who creates the workspace must have the correct permissions on the project that was specified when the workspace was set up. Ensure that one of the following applies to you if you plan to create workspaces:
- You are the Project Owner of the Google project that you specify during workspace creation.
- You are the Project Editor and the IAM Admin for the Google project that you specify during workspace creation.
The set of project permissions that Databricks grants to the service account includes the permissions associated with the following roles:
- Kubernetes Admin (built-in role)
- Compute Storage Admin (built-in role)
- Permissions for a custom role that Databricks automatically creates while launching a workspace. For the full list of permissions included in this custom role, see Appendix: Permissions granted to the service account using the custom role created by Databricks
If your Google Cloud organization policy enables domain restricted sharing, ensure that both the Google Cloud customer IDs for Databricks (
C01p0oudw) and your own organization’s customer ID are in the policy’s allowed list. See the Google article Setting the organization policy. If you need help, contact your Databricks representative before you provision your workspace.
Do one of the following:
In the top navigation, click the project picker and select the project that is associated with your billing account you want to use with Databricks. This is not required to be the same project that you use to deploy your workspaces.
Review the pricing, cancellation, change policy, and terms of service.
Databricks charges for Databricks usage in Databricks Units (DBUs). The number of DBUs a workload consumes varies based on a number of factors, including Databricks compute type (all-purpose or jobs) and Google Cloud machine type. For details, see the pricing page. If you have questions about pricing, contact your Databricks representative.
Additional costs are incurred in your Google Cloud account:
- Google Cloud charges you an additional per-workspace cost for the GKE cluster that Databricks creates for Databricks infrastructure in your account. As of March 30, 2021, the cost for this GKE cluster is approximately $200/month, prorated to the days in the month that the GKE cluster runs. Prices can change, so check the latest prices.
- The GKE cluster cost applies even if Databricks clusters are idle. To reduce this idle-time cost, Databricks deletes the GKE cluster in your account if no Databricks Runtime clusters are active for 72 hours. Other resources, such as VPC and GCS buckets, remain unchanged. The next time a Databricks Runtime cluster starts, Databricks recreates the GKE cluster, which adds to the initial Databricks Runtime cluster launch time. For an example of how GKE cluster deletion reduces monthly costs, let’s say you used a Databricks Runtime cluster on the first of the month but not again for the rest of the month: your GKE usage would be the three days before the idle timeout takes effect and nothing more, costing approximately $20 for the month.
At the top of the page, click Purchase.
In the Order Summary page:
Select a subscription period.
Select a billing account.
The default billing account that appears in the picker is based on the project that you selected in the top navigation in the preview page. If you have access to multiple projects, the billing account picker will show additional billing account options that you can select.
Read the Terms section.
Select the checkboxes to confirm consent to billing and the terms of service.
You will be enrolled initially in Standard Tier. To upgrade to the Premium Tier, contact your Databricks representative.
In the popup that says “Your order request has been sent to Databricks”, click Register with Databricks.
In the Welcome to Databricks pop-up window:
- Type your Company Name (not your email address).
- Click Sign in with Google. Google may ask you to select your Google account email address.
After you confirm identity and confirm access, you will see the Databricks listing in the Google Cloud Marketplace. At the top, click the blue button Manage on Provider.
If the blue button at the top instead says Register with Databricks, wait a few seconds and re-load the web page. Repeat until the blue button says Manage on Provider, then click that button.
In the You’re Leaving Google popup, click OK.
You may need to choose a Google account email address and confirm your identity.
After you authenticate, you see the Databricks account console, where you create and manage your workspaces. You might want to bookmark the account console web page. See Manage your Databricks account.
In the Databricks account console, create your first workspace. See Create and manage workspaces using the account console.
Add users and groups to your workspace. See Manage users.
Databricks workspace users authenticate with their Google Cloud Identity account (or GSuite account) using Google’s OAuth 2.0 implementation, which conforms to the OpenID Connect spec and is OpenID certified. Databricks provides the openid profile scope values in the authentication request to Google. Optionally, you can configure your Google Cloud Identity account (or GSuite account) to federate with an external SAML 2.0 Identity Provider (IdP) to verify user credentials. Google Cloud Identity can federate with Azure Active Directory, Okta, Ping, and other IdPs. However, Databricks interacts directly only with the Google Identity Platform APIs.
Databricks does not have access to user credentials. This architecture reduces risks associated with storing or protecting user credentials because Databricks does not have access to them.
There are two ways for a workspace user to log in to a workspace:
- All users can use their workspace URL directly: Regular users, workspace admins, and account admins can all use the workspace URL directly. The user is authenticated through Databricks integration with Google’s Cloud Identity OAuth 2.0 implementation. When a user is added to the workspace, the user gets an email that includes the URL.
- Account admins can also use the Google Cloud Console to access the workspace: Account admins authenticate with Google Identity OAuth 2.0 to access the Databricks account console. The account console offers a list of available workspaces to choose from. You are redirected to the workspace login page with an authentication token. If the token is accepted, you are not prompted to log in again. On the first login, you are challenged to consent to OAuth scopes.
To learn the Databricks basics, run through the Get started as a Databricks Data Science & Engineering user.
Ask your security administrator if Reauthentication Policies have been applied to your GSuite domain.
If so, add Databricks to the trusted app list. To learn how to add Databricks to the trusted application list, see the Google support article Set session length for Google Cloud services.
compute.globalOperations.get compute.instanceGroups.get compute.instanceGroups.list compute.instances.get compute.instances.list compute.networks.access compute.networks.create compute.networks.delete compute.networks.get compute.networks.getEffectiveFirewalls compute.networks.update compute.networks.updatePolicy compute.networks.use compute.networks.useExternalIp compute.regionOperations.get compute.routers.create compute.routers.delete compute.routers.get compute.routers.update compute.routers.use compute.subnetworks.create compute.subnetworks.delete compute.subnetworks.expandIpCidrRange compute.subnetworks.get compute.subnetworks.getIamPolicy compute.subnetworks.setIamPolicy compute.subnetworks.setPrivateIpGoogleAccess compute.subnetworks.update compute.subnetworks.use compute.subnetworks.useExternalIp container.clusterRoleBindings.create container.clusterRoleBindings.get container.clusterRoles.bind container.clusterRoles.create container.clusterRoles.get container.clusters.create container.clusters.delete container.clusters.get container.clusters.getCredentials container.clusters.list container.clusters.update container.configMaps.create container.configMaps.get container.configMaps.update container.customResourceDefinitions.create container.customResourceDefinitions.get container.customResourceDefinitions.update container.daemonSets.create container.daemonSets.get container.daemonSets.update container.deployments.create container.deployments.get container.deployments.update container.jobs.create container.jobs.get container.jobs.update container.namespaces.create container.namespaces.get container.namespaces.list container.operations.get container.pods.get container.pods.getLogs container.pods.list container.roleBindings.create container.roleBindings.get container.roles.bind container.roles.create container.roles.get container.secrets.create container.secrets.get container.secrets.update container.serviceAccounts.create container.serviceAccounts.get container.services.create container.services.get container.thirdPartyObjects.create container.thirdPartyObjects.delete container.thirdPartyObjects.get container.thirdPartyObjects.list container.thirdPartyObjects.update deploymentmanager.deployments.create deploymentmanager.deployments.delete deploymentmanager.deployments.get deploymentmanager.deployments.getIamPolicy deploymentmanager.deployments.setIamPolicy deploymentmanager.deployments.stop deploymentmanager.deployments.update deploymentmanager.manifests.get deploymentmanager.operations.get deploymentmanager.resources.list resourcemanager.projects.get resourcemanager.projects.getIamPolicy storage.buckets.create storage.buckets.delete storage.buckets.get storage.buckets.getIamPolicy storage.buckets.list storage.buckets.setIamPolicy storage.buckets.update storage.hmacKeys.create storage.hmacKeys.delete storage.hmacKeys.get storage.hmacKeys.list storage.hmacKeys.update storage.objects.create storage.objects.delete storage.objects.get storage.objects.getIamPolicy storage.objects.list storage.objects.setIamPolicy storage.objects.update storage.objects.update iam.serviceAccounts.getIamPolicy iam.serviceAccounts.setIamPolicy