This article explains how to configure and use Unity Catalog to manage data in your Databricks workspace. It is intended primarily for workspace admins who are using Unity Catalog for the first time.
By the end of this article you will have:
A workspace that is enabled for Unity Catalog.
Compute that has access to Unity Catalog.
Users with permission to access and create objects in Unity Catalog.
You may also want to review other introductory articles:
For a quick walkthrough of how to create a table and grant permissions in Unity Catalog, see Tutorial: Create your first table and grant privileges in Unity Catalog.
For key Unity Catalog concepts and an introduction to how Unity Catalog works, see What is Unity Catalog?.
To learn how best to use Unity Catalog to meet your data governance needs, see Unity Catalog best practices.
For your users to start taking advantage of Unity Catalog, their Databricks workspaces must be enabled for Unity Catalog, which means that the workspaces are attached to a Unity Catalog metastore. Once you’ve created a metastore and attached a workspace to it, you can start granting privileges to users.
Before you begin the tasks described in this article, you should familiarize yourself with the basic Unity Catalog concepts, including metastores, admin roles, and managed storage. See What is Unity Catalog?.
You should also confirm that you meet the following requirements:
To use Unity Catalog, your workspace must be enabled for Unity Catalog, which means it must be attached to a Unity Catalog metastore:
If your account already has a Unity Catalog metastore defined for your workspace’s region, you can simply attach your workspace to the existing metastore. See Enable your workspace for Unity Catalog.
If there is no Unity Catalog metastore defined for your workspace’s region, you must create a metastore and then attach the workspace. See Create a Unity Catalog metastore.
When your workspace is enabled for Unity Catalog (attached to a metastore), go to the next step.
The user who creates the workspace is automatically added as a workspace user with the workspace admin role (that is, a user in the
admins workspace-local group). As a workspace admin, you can add and invite users to the workspace, can assign the workspace admin role to other users, and can create service principals and groups.
Account admins also have the ability to add users, service principals, and groups to your workspace. They can grant the account admin and metastore admin roles.
For details, see Manage users.
It can be convenient to manage user access to Databricks by setting up provisioning from a third-party identity provider (IdP), like Okta. For complete instructions, see Sync users and groups from your identity provider.
To run Unity Catalog workloads, compute resources must comply with certain security requirements. Non-compliant compute resources cannot access data or other objects in Unity Catalog. SQL warehouses always comply with Unity Catalog requirements, but some cluster access modes do not. See Access modes.
As a workspace admin, you can opt to make compute creation restricted to admins or let users create their own SQL warehouses and clusters. You can also create cluster policies that enable users to create their own clusters, using Unity Catalog-compliant specifications that you enforce. See Cluster access control and Create and manage compute policies.
To create objects and access them in Unity Catalog catalogs and schemas, a user must have permission to do so. This section describes the user and admin privileges granted on some workspaces by default and describes how to grant additional privileges.
If your workspace includes the automatically-provisioned
main catalog, users have some privileges on that catalog by default.
If the user who manually creates a metastore adds metastore-level storage during metastore creation, a
main catalog is provisioned automatically. In workspaces attached to such metastores, workspace users have the
USE CATALOG privilege on the
main catalog, which doesn’t grant the ability to create or select from any objects in the catalog, but is a prerequisite for working with any objects in the catalog. The user who created the metastore owns the
main catalog by default and can both transfer ownership and grant access to other users.
If metastore storage is added after the metastore is created, no
main catalog is provisioned.
Workspace admins have no special Unity Catalog privileges by default.
Metastore admins must exist and can create any Unity Catalog object and can take ownership of any Unity Catalog object.
For access to other objects, a privileged user must grant that access.
For example, to grant a group the ability to create new schemas in
my-catalog, the catalog owner can run the following in the SQL Editor or a notebook:
GRANT CREATE SCHEMA ON my-catalog.default TO `data-consumers`;
You can also grant and revoke privileges using Catalog Explorer.
You cannot grant privileges to the workspace-local
admins groups. To grant privileges on groups, they must be account-level groups.
For details about managing privileges in Unity Catalog, see Manage privileges in Unity Catalog.
To start using Unity Catalog, you must have at least one catalog defined. Catalogs are the primary unit of data isolation and organization in Unity Catalog. All schemas and tables live in catalogs, as do volumes, views, and models.
Most workspaces enabled for Unity Catalog have access to a pre-provisioned
main catalog that you can use to get started with Unity Catalog. As you add more data and AI assets into Databricks, you can create additional catalogs to group those assets in a way that makes it easy to govern data logically. For recommendations about how best to use catalogs and schemas to organize your data and AI assets, see Unity Catalog best practices.
As a metastore admin or other user with the
CREATE CATALOG privilege, you can create new catalogs in the metastore. When you do, you should:
Create managed storage for the new catalog.
Managed storage is a dedicated storage location in your Google Cloud account for managed tables and managed volumes. You can assign managed storage to the metastore, to catalogs, and to schemas. When a user creates a table, the data is stored in the storage location that is lowest in the hierarchy. For example, if a storage location is defined for the metastore and catalog but not the schema, table data is stored in the location defined for the catalog.
Databricks recommends that you assign managed storage at the catalog level, because catalogs typically represent logical units of data isolation. If you are comfortable with data in multiple catalogs sharing the same storage location, you can default to the metastore-level storage location.
Assigning managed storage to a catalog requires that you create:
A storage credential
An external location that references that storage credential.
For an introduction to these objects and instructions for creating them, see Connect to cloud object storage using Unity Catalog.
Bind the new catalog to your workspace if you want to limit access from other workspaces that share the same metastore.
Grant privileges on the catalog.
For detailed instructions, see Create and manage catalogs.
The following example shows the creation of a catalog with managed storage, followed by granting the
SELECT privilege on the catalog:
CREATE CATALOG IF NOT EXISTS mycatalog
MANAGED LOCATION 'gs://depts/finance';
GRANT SELECT ON mycatalog TO `finance-team`;
For more examples, including instructions for creating catalogs using Catalog Explorer, see Create and manage catalogs.
Schemas represent more granular groupings (like departments or projects, for example) than catalogs. All tables and other Unity Catalog objects in the catalog are contained in schemas. As the owner of a new catalog, you may want to create the schemas in the catalog. But you might want instead to delegate the ability to create schemas to other users, by giving them the
CREATE SCHEMA privilege on the catalog.
For detailed instructions, see Create and manage schemas (databases).
If your workspace was in service before it was enabled for Unity Catalog, it likely has a Hive metastore that contains data that you want to continue to use. Databricks recommends that you migrate the tables managed by the Hive metastore to the Unity Catalog metastore, but if you choose not to, you can continue to work with the Hive metastore.
The Hive metastore is represented in Unity Catalog interfaces as a catalog named
hive_metastore. In order to continue working with data in your Hive metastore without having to update queries to specify the
hive_metastore catalog, you can set the workspace’s default catalog to
hive_metastore. See Manage the default catalog.
Depending on when your workspace was enabled for Unity Catalog, the default catalog may already be
Although Databricks recommends that you create a separate managed storage location for each catalog in your metastore (and you can do the same for schemas), you can opt instead to create a managed location at the metastore level and use it as the default storage for multiple catalogs and schemas.
When an account admin creates a metastore, they have the option to include metastore-level storage or not. You can also add storage to a metastore after it is created.
Metastore-level storage is required only if the following are true:
You want to share notebooks using Databricks-to-Databricks Delta Sharing.
You use a Databricks partner product integration that relies on personal staging locations (deprecated).
For more information about the hierarchy of managed storage locations, see Data is physically separated in storage.
To learn how to add metastore-level storage to metastores that have none, see Add managed storage to an existing metastore.
Run a quick tutorial to create your first table in Unity Catalog: Tutorial: Create your first table and grant privileges in Unity Catalog
Learn more about Unity Catalog: What is Unity Catalog?
Learn best practices for using Unity Catalog: Unity Catalog best practices
Learn how to grant and revoke privileges: Manage privileges in Unity Catalog