Set up your Databricks on Google Cloud account

To get users up and running on your Databricks on Google Cloud account, you must:

  1. Set up your Databricks account in Google.
  2. Create Databricks workspaces.
  3. Add users and groups to your workspaces.

Prerequisites for account and workspace creation

To set up a Databricks on Google Cloud account, you must:

  1. Have a Google billing account defined in Google Cloud. To create a billing account, see the Google documentation article Create, modify, or close your Cloud Billing account.

  2. Confirm that you have the following roles for Google Identity and Access Management (IAM):

    • Billing Administrator (roles/billing.admin) for the target Cloud Billing account or the Google Cloud organization where your project is located. If you don’t have this role, contact an Organization Administrator to request access.
    • Viewer (roles/viewer) for the project associated with the billing account you plan to use . If you are not a viewer, you can either contact the project owner to request access, or create a new project to give yourself the correct permissions. If you create a new project, you must enable billing and link the project to the desired Cloud Billing account.

    To learn about the relationship between between Google Cloud organizations, projects, and billing, see the Google documentation on Cloud Billing access control. To learn more about roles and permissions across Google Cloud, see the documentation on Understanding roles.

  3. Have a Google project to deploy your workspaces in. You will need the project ID when you create your Databricks workspace. This does not need to be the same Google project as the one associated with your billing account.

    If you do not already have a Google Project into which you will deploy your workspaces, create one now:

    1. Confirm that you have a Google Cloud Identity organization object defined within your Google Cloud Console. To create an organization, see the Google documentation article Creating and managing organizations.
    2. Create the project. See the Google documentation article Creating and managing organizations. You must define the project’s parent organization. If you do not specify a project ID during project creation, a project ID will be generated automatically.
    3. Copy the Google Project ID, which you need for Databricks workspace creation.

    If you have a project but do not know the ID, go to your Google Cloud Platform Console Manage Resources page. Find your project and copy the ID.

  4. Enable the Google APIs on the projects that you will use for workspaces:

    Ensure that the following APIs are enabled in your GCP project. To confirm the list of enabled APIs, run this command with the gcloud command line tool. Replace <customer-project> with the GCP project that you will use with Databricks.

    gcloud services list --project <customer-project>

    Ensure that the following services are listed in the response. To enable one of these APIs in your Google Cloud project, see the Google documentation page Enabling an API in your Google Cloud project. In the following list, click the links to go directly to the Google API page. Some APIs have similar names or IDs, so using these links reduces the chance of enabling the wrong API. Before clicking Enable, ensure that you have selected the correct Google project from the picker at the top of the page.

    Google API ID Google API name and link
    storage.googleapis.com Cloud Storage API
    container.googleapis.com Kubernetes Engine API
    deploymentmanager.googleapis.com Cloud Deployment Manager V2 API
  5. In preparation for workspace creation, confirm or raise quotas for the account owner’s Google Cloud project.

    These minimum quotas are required for the target Google Cloud regions that the Databricks clusters for the workspace are intended to run under. You specify the region for a workspace during workspace creation. For the list of supported regions, see Supported Databricks regions.

    See the following related Google articles:

    Important

    If you change any quotas, wait 15 minutes for the quotas to take effect before creating a workspace.

    Google Cloud quota name Required minimum quota Recommended quotas for running at scale
    compute.googleapis.com/cpus 60 2500
    compute.googleapis.com/routes GCP default is OK 300
    monitoring.googleapis.com/ingestion_requests GCP default is OK 6000
    compute.googleapis.com/subnetworks GCP default is OK 275
    compute.googleapis.com/regional_in_use_addresses GCP default is OK 500
    compute.googleapis.com/instance_group_managers GCP default is OK 500
    iam.googleapis.com/quota/service-account-count GCP default is OK 100
    compute.googleapis.com/instance_groups GCP default is OK 500
    compute.googleapis.com/disks_total_storage GCP default is OK 50 TB
    compute.googleapis.com/n2_cpus 50 300
    compute.googleapis.com/ssd_total_storage 7.5 TB 50 TB
  6. In preparation for workspace creation, confirm the permissions that are required to create a workspace.

    For each workspace, Databricks creates a service account with the minimal permissions needed to create and manage the workspace. Your Google OAuth identity will be used to grant permissions to the service account on your project. All you have to do is click OK on a standard OAuth dialog. The Databricks account admin who creates the workspace must have the correct permissions on the project that was specified when the workspace was set up. Ensure that one of the following applies to you if you plan to create workspaces:

    • You are the Project Owner of the Google project that you specify during workspace creation.
    • You are the Project Editor and the IAM Admin for the Google project that you specify during workspace creation.

    The set of project permissions that Databricks grants to the service account includes the permissions associated with the following roles:

    Important

    If your Google Cloud organization policy enables domain restricted sharing, ensure that both the Google Cloud customer IDs for Databricks (C01p0oudw) and your own organization’s customer ID are in the policy’s allowed list. See the Google article Setting the organization policy. If you need help, contact your Databricks representative before you provision your workspace.

Set up your account and create a workspace

  1. Do one of the following:

  2. In the top navigation, click the project picker and select the project that is associated with your billing account you want to use with Databricks. This is not required to be the same project that you use to deploy your workspaces.

  3. Review the pricing, cancellation, change policy, and terms of service.

    Databricks charges for Databricks usage in Databricks Units (DBUs). The number of DBUs a workload consumes varies based on a number of factors, including Databricks compute type (all-purpose or jobs) and Google Cloud machine type. For details, see the pricing page. If you have questions about pricing, contact your Databricks representative.

    Additional costs are incurred in your Google Cloud account:

    • Google Cloud charges you an additional per-workspace cost for the GKE cluster that Databricks creates for Databricks infrastructure in your account. As of March 30, 2021, the cost for this GKE cluster is approximately $200/month, prorated to the days in the month that the GKE cluster runs. Prices can change, so check the latest prices.
    • The GKE cluster cost applies even if Databricks clusters are idle. To reduce this idle-time cost, Databricks deletes the GKE cluster in your account if no Databricks Runtime clusters are active for 72 hours. Other resources, such as VPC and GCS buckets, remain unchanged. The next time a Databricks Runtime cluster starts, Databricks recreates the GKE cluster, which adds to the initial Databricks Runtime cluster launch time. For an example of how GKE cluster deletion reduces monthly costs, let’s say you used a Databricks Runtime cluster on the first of the month but not again for the rest of the month: your GKE usage would be the three days before the idle timeout takes effect and nothing more, costing approximately $20 for the month.
  4. At the top of the page, click Purchase.

  5. In the Order Summary page:

    1. Select a subscription period.

    2. Select a billing account.

      Note

      The default billing account that appears in the picker is based on the project that you selected in the top navigation in the preview page. If you have access to multiple projects, the billing account picker will show additional billing account options that you can select.

    3. Read the Terms section.

    4. Select the checkboxes to confirm consent to billing and the terms of service.

    5. Click Subscribe.

    Important

    You will be enrolled initially in Standard Tier. To upgrade to the Premium Tier, contact your Databricks representative.

  6. In the popup that says “Your order request has been sent to Databricks”, click Register with Databricks.

  7. In the Welcome to Databricks pop-up window:

    1. Type your Company Name (not your email address).
    2. Click Sign in with Google. Google may ask you to select your Google account email address.

  8. After you confirm identity and confirm access, you will see the Databricks listing in the Google Cloud Marketplace. At the top, click the blue button Manage on Provider.

    Note

    If the blue button at the top instead says Register with Databricks, wait a few seconds and re-load the web page. Repeat until the blue button says Manage on Provider, then click that button.

  9. In the You’re Leaving Google popup, click OK.

    You may need to choose a Google account email address and confirm your identity.

  10. After you authenticate, you see the Databricks account console, where you create and manage your workspaces. You might want to bookmark the account console web page. See Manage your Databricks account.

  11. In the Databricks account console, create your first workspace. See Create and manage workspaces using the account console.

  12. Log in to your new workspace.

  13. Add users and groups to your workspace. See Manage users.

Log in to a Databricks workspace

Databricks workspace users authenticate with their Google Cloud Identity account (or GSuite account) using Google’s OAuth 2.0 implementation, which conforms to the OpenID Connect spec and is OpenID certified. Databricks provides the openid profile scope values in the authentication request to Google. Optionally, you can configure your Google Cloud Identity account (or GSuite account) to federate with an external SAML 2.0 Identity Provider (IdP) to verify user credentials. Google Cloud Identity can federate with Azure Active Directory, Okta, Ping, and other IdPs. However, Databricks interacts directly only with the Google Identity Platform APIs.

Databricks does not have access to user credentials. This architecture reduces risks associated with storing or protecting user credentials because Databricks does not have access to them.

There are two ways for a workspace user to log in to a workspace:

  • All users can use their workspace URL directly: Regular users, workspace admins, and account admins can all use the workspace URL directly. The user is authenticated through Databricks integration with Google’s Cloud Identity OAuth 2.0 implementation. When a user is added to the workspace, the user gets an email that includes the URL.
  • Account admins can also use the Google Cloud Console to access the workspace: Account admins authenticate with Google Identity OAuth 2.0 to access the Databricks account console. The account console offers a list of available workspaces to choose from. You are redirected to the workspace login page with an authentication token. If the token is accepted, you are not prompted to log in again. On the first login, you are challenged to consent to OAuth scopes.

Learn Databricks basics

To learn the Databricks basics, run through the Get started as a Databricks Data Science & Engineering user.

Tips and Troubleshooting

Why are users unable to log into the Databricks account console or into workspaces?

Ask your security administrator if Reauthentication Policies have been applied to your GSuite domain.

If so, add Databricks to the trusted app list. To learn how to add Databricks to the trusted application list, see the Google support article Set session length for Google Cloud services.

Maximum workspaces per week per Google Cloud project

You can create at most 200 workspaces per week in the same GCP project. If you exceed this limit, creating a workspace fails with the error message: “Creating custom cloud IAM role <your-role> in project <your-project> rejected.”

Appendix: Permissions granted to the service account using the custom role created by Databricks

  compute.globalOperations.get
  compute.instanceGroups.get
  compute.instanceGroups.list
  compute.instances.get
  compute.instances.list
  compute.networks.access
  compute.networks.create
  compute.networks.delete
  compute.networks.get
  compute.networks.getEffectiveFirewalls
  compute.networks.update
  compute.networks.updatePolicy
  compute.networks.use
  compute.networks.useExternalIp
  compute.regionOperations.get
  compute.routers.create
  compute.routers.delete
  compute.routers.get
  compute.routers.update
  compute.routers.use
  compute.subnetworks.create
  compute.subnetworks.delete
  compute.subnetworks.expandIpCidrRange
  compute.subnetworks.get
  compute.subnetworks.getIamPolicy
  compute.subnetworks.setIamPolicy
  compute.subnetworks.setPrivateIpGoogleAccess
  compute.subnetworks.update
  compute.subnetworks.use
  compute.subnetworks.useExternalIp
  container.clusterRoleBindings.create
  container.clusterRoleBindings.get
  container.clusterRoles.bind
  container.clusterRoles.create
  container.clusterRoles.get
  container.clusters.create
  container.clusters.delete
  container.clusters.get
  container.clusters.getCredentials
  container.clusters.list
  container.clusters.update
  container.configMaps.create
  container.configMaps.get
  container.configMaps.update
  container.customResourceDefinitions.create
  container.customResourceDefinitions.get
  container.customResourceDefinitions.update
  container.daemonSets.create
  container.daemonSets.get
  container.daemonSets.update
  container.deployments.create
  container.deployments.get
  container.deployments.update
  container.jobs.create
  container.jobs.get
  container.jobs.update
  container.namespaces.create
  container.namespaces.get
  container.namespaces.list
  container.operations.get
  container.pods.get
  container.pods.getLogs
  container.pods.list
  container.roleBindings.create
  container.roleBindings.get
  container.roles.bind
  container.roles.create
  container.roles.get
  container.secrets.create
  container.secrets.get
  container.secrets.update
  container.serviceAccounts.create
  container.serviceAccounts.get
  container.services.create
  container.services.get
  container.thirdPartyObjects.create
  container.thirdPartyObjects.delete
  container.thirdPartyObjects.get
  container.thirdPartyObjects.list
  container.thirdPartyObjects.update
  deploymentmanager.deployments.create
  deploymentmanager.deployments.delete
  deploymentmanager.deployments.get
  deploymentmanager.deployments.getIamPolicy
  deploymentmanager.deployments.setIamPolicy
  deploymentmanager.deployments.stop
  deploymentmanager.deployments.update
  deploymentmanager.manifests.get
  deploymentmanager.operations.get
  deploymentmanager.resources.list
  resourcemanager.projects.get
  resourcemanager.projects.getIamPolicy
  storage.buckets.create
  storage.buckets.delete
  storage.buckets.get
  storage.buckets.getIamPolicy
  storage.buckets.list
  storage.buckets.setIamPolicy
  storage.buckets.update
  storage.hmacKeys.create
  storage.hmacKeys.delete
  storage.hmacKeys.get
  storage.hmacKeys.list
  storage.hmacKeys.update
  storage.objects.create
  storage.objects.delete
  storage.objects.get
  storage.objects.getIamPolicy
  storage.objects.list
  storage.objects.setIamPolicy
  storage.objects.update
  storage.objects.update
  iam.serviceAccounts.getIamPolicy
  iam.serviceAccounts.setIamPolicy