Get started with Databricks
If you’re new to Databricks, you’ve found the place to start. This section includes instructions for basic account setup, a tour of the Databricks workspace UI, and some basic tutorials related to exploratory data analysis and ETL on Databricks.
For information about online training resources, see Get free Databricks training.
Start a Databricks free trial on Google Cloud
These are detailed instructions about how to subscribe to Databricks with a free trial subscription, which becomes a pay-as-you-go subscription after the free trial expires.
Note
If your company has a contract subscription, do not use these instructions. Ask your Databricks account team about how to create your subscription with a Google Marketplace Private Offer.
If you are already familiar with setting up new applications in Google Marketplace, you can instead use the shorter quickstart instructions for creating a new free trial subscription.
To get users up and running on Databricks on Google Cloud, you must:
Create your Databricks subscription in Google Cloud Marketplace. This creates a Databricks account. You are the account owner and only you can perform initial setup, but you can assign other users as account admins to perform follow-on account administration tasks.
Create at least one Databricks workspace. A workspace is the environment that your team will use to access all of their Databricks assets.
Add users and groups to your workspaces.
Watch Deploying Databricks on Google Cloud for an overview of this process.
Requirements
Before you create a Databricks on Google Cloud account:
You must have a Google billing account.
You must have the following roles for Google Identity and Access Management (IAM):
Billing Administrator (
roles/billing.admin
) for the target Cloud Billing account or the Google Cloud organization where your project is located. If you don’t have this role, contact an Organization Administrator to request access.Viewer (
roles/viewer
) for the project associated with the billing account you plan to use. If you are not a viewer, you can either contact the project owner to request access, or create a new project to give yourself the correct permissions. If you create a new project, you must enable billing and link the project to the desired Cloud Billing account.
To learn about the relationship between Google Cloud organizations, projects, and billing, see the Google documentation on Cloud Billing access control. To learn more about roles and permissions across Google Cloud, see the documentation on Understanding roles.
You might not be the only user in your organization who can cancel the Databricks subscription. The subscription can be cancelled by Google Cloud users in your organization who have the
consumerprocurement.orders.cancel
permission on the billing account, which is the case for those with the Billing Admin role in the billing account or the Organization Owner role in the parent Organization.Important
Databricks recommends confirming that the set of Google Cloud users who can cancel the Databricks subscription is the correct set of users. Overly broad access might lead to accidental cancellation of the subscription, which deletes all workspaces in the Databricks account. Workspace deletion is not reversible.
You must have a Google Cloud project to deploy your workspaces in. You need the project ID when you create your Databricks workspace. This does not need to be the same Google Cloud project as the one associated with your billing account. During workspace creation, Databricks enables some required Google APIs on the project if they are not already enabled.
If you do not already have a Google Cloud project into which you will deploy your workspaces, create one now:
Confirm your Google account is enabled for Google Workspace or Cloud Identity.
Confirm you have a Google Cloud Identity organization object defined within your Google Cloud Console. If needed, you can see the Google documentation for Creating and managing organizations.
Create the project. See the Google documentation article Creating and managing organizations. You must define the project’s parent organization. If you do not specify a project ID during project creation, a project ID is generated automatically.
Copy the Google Cloud project ID. You need this to create Databricks workspaces.
If you have a project but do not know its ID, go to your Google Cloud Platform Console Manage Resources page. Find your project and copy its ID.
The Google Cloud project that you plan to use with your workspace to run clusters must have appropriate quotas. Review the required resource quotas for your project. You might need to request quota increases and wait for approval. If you change any quotas, wait 15 minutes for the changes to take effect before you create a workspace. If you requested raises, wait 15 minutes after you get email confirmation of the updates to the quotas.
To prepare for creating workspaces, confirm the permissions that are required to create a workspace. See Required permissions.
If your Google Cloud organization policy enables domain restricted sharing, ensure that both the Google Cloud customer IDs for Databricks (
C01p0oudw
) and your own organization’s customer ID are in the policy’s allowed list. See the Google article Setting the organization policy. If you need help, contact your Databricks account team before you provision your workspace.
Set up a Databricks free trial and first workspace
Note
If your company has a contract subscription, do not use these instructions. Ask your Databricks account team about how to create your subscription with a Google Marketplace Private Offer.
Go to the Databricks listing in the Google Cloud Marketplace.
There other ways to get to this page. Go to Google Cloud Marketplace Explorer, use the marketplace search box to search for “Databricks”, and click Databricks. You can also go to the Google Cloud Console, and then in the left navigation, under Partner Solutions, click Databricks.
In the top navigation’s project picker, select the Google Cloud project that is associated with the billing account that you want to use with Databricks. This is not required to be the same project that you use to deploy your workspaces.
Review the pricing, cancellation, change policy, and terms of service.
Databricks charges for Databricks usage in Databricks Units (DBUs). The number of DBUs a workload consumes varies based on a number of factors, including Databricks compute type (all-purpose or jobs) and Google Cloud machine type. For details, see the pricing page.
Additional costs are incurred in your Google Cloud account:
Google Cloud charges you an additional per-workspace cost for the GKE cluster that Databricks creates for Databricks infrastructure in your account. As of March 30, 2021, the cost for this GKE cluster is approximately $200/month, prorated to the days in the month that the GKE cluster runs. Prices can change, so check the latest prices.
The GKE cluster cost applies even if Databricks clusters are idle. To reduce this idle-time cost, Databricks deletes the GKE cluster in your account if no Databricks Runtime clusters are active for five days. Other resources, such as VPC and GCS buckets, remain unchanged. The next time a Databricks Runtime cluster starts, Databricks recreates the GKE cluster, which adds to the initial Databricks Runtime cluster launch time. For an example of how GKE cluster deletion reduces monthly costs, let’s say you used a Databricks Runtime cluster on the first of the month but not again for the rest of the month: your GKE usage would be the five days before the idle timeout takes effect and nothing more, costing approximately $33 for the month.
At the top of the page, click Subscribe.
In the Order Summary page:
Select a subscription period.
Select a billing account. The default billing account that appears in the picker is based on the project that you selected in the top navigation in the preview page. If you have access to multiple projects, the billing account picker shows additional billing account options.
Read the Terms section.
Select the checkboxes to confirm consent to billing and the terms of service.
Click Subscribe.
In the popup that says “Your order request has been sent to Databricks”, click Register with Databricks.
In the Welcome to Databricks pop-up window:
Type your company’s name. Do not enter an email address.
Click Sign in with Google. Google may ask you to select your Google account email address.
After you confirm identity and confirm access, you will see the Databricks listing in the Google Cloud Marketplace. At the top, click the blue button Manage on Provider. If the blue button at the top instead says Register with Databricks, wait a few seconds and re-load the web page. Repeat until the blue button says Manage on Provider, then click that button.
Important
It is critical that you click on Manage on Provider to activate your subscription.
In the You’re Leaving Google popup, click OK. You might need to choose a Google account email address and confirm your identity.
Choose a plan. Initially you are on the Standard plan but you can upgrade to the Premium plan. You can compare the different Databricks pricing plans. At a later time, you can upgrade or downgrade your account’s plan. Upgrades and downgrades both affect future workspaces, but there are important differences between how upgrade and downgrade works for existing workspaces. See Confirm or change your subscription plan.
You see the Databricks account console, where you create and manage your workspaces. You might want to bookmark the account console web page. See Manage your Databricks account.
In the Databricks account console, click Create Workspace to create your first workspace. See Create a workspace using the account console for additional details.
In most accounts, the workspace will be enabled for Unity Catalog by default, providing centralized data governance and identity management. See What is Unity Catalog? and Set up and manage Unity Catalog.
Workspace creation considerations
When you create your workspace, please consider the following:
If you plan to use large clusters or many workspaces, ensure that your workspaces have sufficient IP space to run Databricks jobs by calculating your GKE subnet ranges by using the network sizing calculator.
Do not modify or customize the Google Kubernetes Engine (GKE) cluster that is launched by Databricks for your workspace. If you have a need to customize the cluster, please contact your Databricks account team to ascertain the safety and long term maintainability of such a change.
Log in to a Databricks workspace
Databricks workspace users authenticate with their Google Cloud Identity account (or GSuite account) using Google’s OAuth 2.0 implementation, which conforms to the OpenID Connect spec and is OpenID certified. Databricks provides the openid profile scope values in the authentication request to Google. Optionally, you can configure your Google Cloud Identity account (or GSuite account) to federate with an external SAML 2.0 Identity Provider (IdP) to verify user credentials. Google Cloud Identity can federate with Microsoft Entra ID, Okta, Ping, and other IdPs. However, Databricks interacts directly only with the Google Identity Platform APIs.
Databricks does not have access to user credentials. This architecture reduces risks associated with storing or protecting user credentials because Databricks does not have access to them.
There are three ways for a workspace user to log in to a workspace:
All users can use their workspace URL directly: Regular users, workspace admins, and account admins can use the workspace URL directly. The user is authenticated through Databricks integration with Google’s Cloud Identity OAuth 2.0 implementation. When a user is added to the workspace, the user gets an email that includes the URL.
All users can access their workspaces through the Databricks account console: Use your Databricks username (email address) to log into the account console, go to the Workspaces tab, find your workspace, and click Open .
Account admins can also use the Google Cloud Console to access the workspace: Account admins authenticate with Google Identity OAuth 2.0 to access the Databricks account console. The account console offers a list of available workspaces to choose from. You are redirected to the workspace login page with an authentication token. If the token is accepted, you are not prompted to log in again. On the first login, you are challenged to consent to OAuth scopes.
Next steps
Your next steps depend on whether you want to continue setting up your account organization and security or want to start building out data pipelines:
Connect your Databricks workspace to external data sources. See Connect to data sources.
Ingest your data into the workspace. See Ingest data into a Databricks lakehouse.
Build out your account organization and security. See Get started with Databricks administration.
Learn about managing access to data in your workspace. See What is Unity Catalog?.
Learn about managing access to workspace objects like notebooks, compute, dashboards, queries. See Access control lists.
Get help
If you have any questions about setting up Databricks and need live help, please e-mail onboarding-help@databricks.com.
If you have a Databricks support package, you can open and manage support cases with Databricks. See Learn how to use Databricks support.
If your organization does not have a Databricks support subscription, or if you are not an authorized contact for your company’s support subscription, you can get answers to many questions in Databricks Office Hours or from the Databricks Community.
If you need additional help, sign up for a live weekly demo to ask questions and practice alongside Databricks experts. Or, follow this blog series on best practices for managing and maintaining your environments.