Get started using Unity Catalog
This article provides step-by-step instructions for setting up Unity Catalog for your organization. It describes how to enable your Databricks account to use Unity Catalog and how to create your first tables in Unity Catalog.
Overview of Unity Catalog setup
This section provides a high-level overview of how to set up your Databricks account to use Unity Catalog and create your first tables. For detailed step-by-step instructions, see the sections that follow this one.
Set up your Databricks account for Unity Catalog
To enable your Databricks account to use Unity Catalog, you do the following:
Create a GCS bucket that Unity Catalog can use to store managed table data in your Google Cloud account.
Create a metastore for each region in which your organization operates. This metastore functions as the top-level container for all of your data in Unity Catalog.
Give Unity Catalog access to the GCS bucket.
As part of the metastore creation process, Databricks generates a Google Cloud service account that you use to grant access.
Assign workspaces to the metastore. Each workspace has the same view of the data that you manage in Unity Catalog.
Add users, groups, and service principals to your Databricks account.
For existing Databricks accounts, these identities are already present.
(Optional) Transfer your metastore admin role to a group.
Set up data access for your users
To set up data access for your users, you do the following:
In a workspace, create at least one compute resource: either a cluster or SQL warehouse.
You will use this compute resource when you run queries and commands, including grant statements on data objects that are secured in Unity Catalog.
Create at least one catalog.
Catalogs hold the schemas (databases) that in turn hold the tables that your users work with.
Create at least one schema.
For each level in the data hierarchy (catalogs, schemas, tables), you grant privileges to users, groups, or service principals. You can also grant row- or column-level privileges using dynamic views.
You must be a Databricks account admin.
Your Databricks account must be on the Premium plan.
In Google Cloud, you must have the ability to create GCS buckets and assign permissions to the GCS buckets you create.
You must have at least one workspace that you want to use with Unity Catalog. See Create a workspace using the account console.
Configure a storage bucket in Google Cloud
In this step, you create the GCS bucket required by Unity Catalog to store and access managed table data in your Google Cloud account.
Log into your Google Cloud console and create a new GCS bucket in the same region as the workspace you want to use with Unity Catalog.
Do not allow direct user access to this bucket.
Make a note of the bucket path (
Create your first metastore
To create a metastore:
Log in to the Databricks account console.
Click Create Metastore.
Enter the following:
A name for the metastore.
The region where you want to deploy the metastore.
This must be in the same region as the workspaces you want to use to access the data. Make sure that this matches the region of the GCS bucket you created earlier.
The path to the GCS bucket that you created in the previous task.
The Provide Storage Access dialog appears. It displays the system-generated Service Account Name and asks you to grant that service account two IAM roles for the GCS bucket. Keep this dialog open when you proceed to the next task.
Give the service account access to your GCS bucket and assign workspaces
In another browser tab or window, go to the Google Cloud console and open the GCS bucket that you provided in the previous step.
On the Permission tab, click + Grant access and assign the service account the following roles:
Storage Legacy Bucket Reader
Storage Object Admin
Use the service account’s email address as the principal identifier.
Return to the Provide Storage Access dialog in the Databricks account console and click Permissions granted.
Databricks validates that the service account has the correct access to the bucket.
When the validation is successful, you can select workspaces to assign to the metastore.
To learn how to assign workspaces to metastores, see Enable a workspace for Unity Catalog.
(Recommended) Transfer the metastore admin role to a group.
The user who creates a metastore is its owner, also called the metastore admin. The metastore admin can create top-level objects in the metastore such as catalogs and can manage access to tables and other objects. Databricks recommends that you reassign the metastore admin role to a group. See (Recommended) Transfer ownership of your metastore to a group.
Add users and groups
To achieve a consistent view of users and to be able to manage data access across workspaces, Unity Catalog introduces a centrally managed identity system, also known as identity federation. This allows an administrator to control user access to workspaces from the account console and other account-level interfaces.
A Unity Catalog metastore can be shared across multiple Databricks workspaces. Unity Catalog takes advantage of Databricks account-level identity management to provide a consistent view of users, service principals, and groups across all workspaces. In this step, you create users and groups in the account console and then choose the workspaces these identities can access.
If you have an existing account and workspaces, your probably already have existing users and groups in your account, so you can skip the user and group creation steps.
If you have a large number of users or groups in your account, or if you prefer to manage identities outside of Databricks, you can sync users and groups from your identity provider (IdP).
To add a user and group using the account console:
Log in to the account console (requires the user to be an account admin).
Click User management.
Add a user:
Click Add User.
Enter a name and email address for the user.
Click Send Invite.
Add a group:
Click Add Group.
Enter a name for the group.
When prompted, add users to the group.
Add a user or group to a workspace, where they can perform data science, data engineering, and data analysis tasks using the data managed by Unity Catalog:
In the sidebar, click Workspaces and select a workspace.
On the Permissions tab, click Add permissions.
Search for and select the user or group, assign the permission level (workspace User or Admin), and click Save.
To get started, create a group called data-consumers. This group is used later in this walk-through.
Create a cluster or SQL warehouse
Before you can start creating tables and assigning permissions, you need to create a compute resource to run your table-creation and permission-assignment workloads.
Tables defined in Unity Catalog are protected by fine-grained access controls. To ensure that access controls are enforced, Unity Catalog requires compute resources to conform to a secure configuration. Non-conforming compute resources cannot access tables in Unity Catalog.
Databricks provides two kinds of compute resources:
Clusters, which are used for workloads in the Data Science & Engineering and Databricks Machine Learning persona-based environments, for example, executing SQL commands in a Databricks notebook.
SQL warehouses, which are used for executing queries in Databricks SQL.
You can use either of these compute resources to work with Unity Catalog, depending on the environment you are using: SQL warehouses for Databricks SQL or clusters for the Data Science & Engineering and Databricks Machine Learning environments.
Create a cluster
To create a cluster that can access Unity Catalog:
Log in to your workspace as a workspace admin or user with permission to create clusters.
Click Create compute.
Enter a name for the cluster.
Set the Access mode to Shared.
Only Single user and Shared access modes support Unity Catalog. See What is cluster access mode?.
Set Databricks runtime version to Runtime: 11.3 LTS (Scala 2.12, Spark 3.3.0) or higher.
Click Create Cluster.
For specific configuration options, see Create a cluster.
Create a SQL warehouse
SQL warehouses support Unity Catalog by default, and there is no special configuration required.
To create a SQL warehouse:
Log in to your workspace as a workspace admin or user with permission to create clusters.
From the persona switcher, select SQL.
Click Create and select SQL Warehouse.
For specific configuration options, see Configure SQL warehouses.
Create your first table and manage permissions
Unity Catalog enables you to define access to tables declaratively using SQL or the Databricks Explorer UI. It is designed to follow a “define once, secure everywhere” approach, meaning that access rules will be honored from all Databricks workspaces, clusters, and SQL warehouses in your account, as long as the workspaces share the same metastore.
In this example, you’ll run a notebook that creates a table named
department in the
main catalog and
default schema (database). This catalog and schema are created automatically for all metastores.
USE CATALOG permission. All users have the
USE CATALOG permission on the
main catalog by default. No other permissions are required to complete this example apart from those that you grant as you run it.
Create a notebook and attach it to the cluster you created in Create a cluster or SQL warehouse.
SQLas your notebook language.
Add the following commands to the notebook and run them:
GRANT USE SCHEMA, CREATE TABLE ON SCHEMA main.default TO `<user>@<domain>.com`;
<user>@<domain>.comwith your Databricks username. You must enclose the username with backticks (
CREATE TABLE IF NOT EXISTS main.default.department ( deptcode INT, deptname STRING, location STRING );
INSERT INTO main.default.department VALUES (10, 'FINANCE', 'EDINBURGH'), (20, 'SOFTWARE', 'PADDINGTON');
You now have a table in Unity Catalog.
Find the new table in Data Explorer.
In the sidebar, click Data, then use the schema browser (or search) to find the
maincatalog and the
defaultcatalog, where you’ll find the
Notice that you don’t need a running cluster or SQL warehouse to browse data in Data Explorer.
Grant permissions on the table.
As the original table creator, you’re the table owner, and you can grant other users permission to read or write to the table. You can even transfer ownership, but we won’t do that here.
On the table page in Data Explorer, go to the Permissions tab and click Grant.
On the Grant on dialog:
Select the users and groups you want to give permission to. In this example, we use a group called
Select the privileges you want to grant. For this example, assign the
SELECTprivilege and click Grant.
For more information about the Unity Catalog privileges and permissions model, see Manage privileges in Unity Catalog.
You can also grant those permissions using the following SQL statement in a Databricks notebook or the Databricks SQL query editor:
GRANT SELECT ON main.default.department TO `data-consumers`;
Run one of the example notebooks that follow for a more detailed walkthrough that includes catalog and schema creation, a summary of available privileges, a sample query, and more.
(Optional) Link the metastore to additional workspaces
A key benefit of Unity Catalog is the ability to share a single metastore among multiple workspaces that are located in the same region. You can run different types of workloads against the same data without moving or copying data among workspaces. Each workspace can have only one Unity Catalog metastore assigned to it.
To learn how to link the metastore to additional workspaces, see Enable a workspace for Unity Catalog.
(Recommended) Sync account-level identities from your IdP
You can manage user access to Databricks by setting up provisioning from a third-party identity provider (IdP), like Okta. For complete instructions, see Sync users and groups from your identity provider.
(Recommended) Transfer ownership of your metastore to a group
(Optional) Install the Unity Catalog CLI
The Unity Catalog CLI is experimental, but it can be a convenient way to manage Unity Catalog from the command line. It is part of the Databricks CLI. To use the Unity Catalog CLI, do the following:
Optionally, create one or more connection profiles to use with the CLI.
Learn how to use the Databricks CLI in general.
Begin using the Unity Catalog CLI.
Learn more about Unity Catalog: What is Unity Catalog?