Set up and manage Unity Catalog

This article explains how to configure and use Unity Catalog to manage data in your Databricks workspace. It is intended primarily for workspace admins who are using Unity Catalog for the first time.

By the end of this article you will have:

  • A workspace that is enabled for Unity Catalog.

  • Compute that has access to Unity Catalog.

  • Users with permission to access and create objects in Unity Catalog.

You may also want to review other introductory articles:

Overview of Unity Catalog enablement

For your users to start taking advantage of Unity Catalog, their Databricks workspaces must be enabled for Unity Catalog, which means that the workspaces are attached to a Unity Catalog metastore. Once you’ve created a metastore and attached a workspace to it, you can start granting privileges to users.

Before you begin

Before you begin the tasks described in this article, you should familiarize yourself with the basic Unity Catalog concepts, including metastores, admin roles, and managed storage. See What is Unity Catalog?.

You should also confirm that you meet the following requirements:

Step 1: Attach your workspace to a Unity Catalog metastore

To use Unity Catalog, your workspace must be enabled for Unity Catalog, which means it must be attached to a Unity Catalog metastore:

  • If your account already has a Unity Catalog metastore defined for your workspace’s region, you can simply attach your workspace to the existing metastore. See Enable your workspace for Unity Catalog.

  • If there is no Unity Catalog metastore defined for your workspace’s region, you must create a metastore and then attach the workspace. See Create a Unity Catalog metastore.

When your workspace is enabled for Unity Catalog (attached to a metastore), go to the next step.

Step 2: Add users and assign the workspace admin role

The user who creates the workspace is automatically added as a workspace user with the workspace admin role (that is, a user in the admins workspace-local group). As a workspace admin, you can add and invite users to the workspace, can assign the workspace admin role to other users, and can create service principals and groups.

Account admins also have the ability to add users, service principals, and groups to your workspace. They can grant the account admin and metastore admin roles.

For details, see Manage users.

Step 3: Create clusters or SQL warehouses that users can use to run queries and create objects

To run Unity Catalog workloads, compute resources must comply with certain security requirements. Non-compliant compute resources cannot access data or other objects in Unity Catalog. SQL warehouses always comply with Unity Catalog requirements, but some cluster access modes do not. See Access modes.

As a workspace admin, you can opt to make compute creation restricted to admins or let users create their own SQL warehouses and clusters. You can also create cluster policies that enable users to create their own clusters, using Unity Catalog-compliant specifications that you enforce. See Cluster access control and Create and manage compute policies.

Step 4: Grant privileges to users

To create objects and access them in Unity Catalog catalogs and schemas, a user must have permission to do so. This section describes the user and admin privileges granted on some workspaces by default and describes how to grant additional privileges.

Default user privileges

If your workspace includes the automatically-provisioned main catalog, users have some privileges on that catalog by default.

If the user who manually creates a metastore adds metastore-level storage during metastore creation, a main catalog is provisioned automatically. In workspaces attached to such metastores, workspace users have the USE CATALOG privilege on the main catalog, which doesn’t grant the ability to create or select from any objects in the catalog, but is a prerequisite for working with any objects in the catalog. The user who created the metastore owns the main catalog by default and can both transfer ownership and grant access to other users.

If metastore storage is added after the metastore is created, no main catalog is provisioned.

Default admin privileges

  • Workspace admins have no special Unity Catalog privileges by default.

  • Metastore admins must exist and can create any Unity Catalog object and can take ownership of any Unity Catalog object.

Grant privileges

For access to other objects, a privileged user must grant that access.

For example, to grant a group the ability to create new schemas in my-catalog, the catalog owner can run the following in the SQL Editor or a notebook:

GRANT CREATE SCHEMA ON my-catalog.default TO `data-consumers`;

You can also grant and revoke privileges using Catalog Explorer.

Important

You cannot grant privileges to the workspace-local users or admins groups. To grant privileges on groups, they must be account-level groups.

For details about managing privileges in Unity Catalog, see Manage privileges in Unity Catalog.

Step 5: Create new catalogs and schemas

To start using Unity Catalog, you must have at least one catalog defined. Catalogs are the primary unit of data isolation and organization in Unity Catalog. All schemas and tables live in catalogs, as do volumes, views, and models.

Most workspaces enabled for Unity Catalog have access to a pre-provisioned main catalog that you can use to get started with Unity Catalog. As you add more data and AI assets into Databricks, you can create additional catalogs to group those assets in a way that makes it easy to govern data logically. For recommendations about how best to use catalogs and schemas to organize your data and AI assets, see Unity Catalog best practices.

As a metastore admin or other user with the CREATE CATALOG privilege, you can create new catalogs in the metastore. When you do, you should:

  1. Create managed storage for the new catalog.

    Managed storage is a dedicated storage location in your Google Cloud account for managed tables and managed volumes. You can assign managed storage to the metastore, to catalogs, and to schemas. When a user creates a table, the data is stored in the storage location that is lowest in the hierarchy. For example, if a storage location is defined for the metastore and catalog but not the schema, table data is stored in the location defined for the catalog.

    Databricks recommends that you assign managed storage at the catalog level, because catalogs typically represent logical units of data isolation. If you are comfortable with data in multiple catalogs sharing the same storage location, you can default to the metastore-level storage location.

    Assigning managed storage to a catalog requires that you create:

    • A storage credential

    • An external location that references that storage credential.

    For an introduction to these objects and instructions for creating them, see Connect to cloud object storage using Unity Catalog.

  2. Bind the new catalog to your workspace if you want to limit access from other workspaces that share the same metastore.

    See Bind a catalog to one or more workspaces.

  3. Grant privileges on the catalog.

For detailed instructions, see Create and manage catalogs.

Catalog creation example

The following example shows the creation of a catalog with managed storage, followed by granting the SELECT privilege on the catalog:

CREATE CATALOG IF NOT EXISTS mycatalog
  MANAGED LOCATION 'gs://depts/finance';

GRANT SELECT ON mycatalog TO `finance-team`;

For more examples, including instructions for creating catalogs using Catalog Explorer, see Create and manage catalogs.

Create a schema

Schemas represent more granular groupings (like departments or projects, for example) than catalogs. All tables and other Unity Catalog objects in the catalog are contained in schemas. As the owner of a new catalog, you may want to create the schemas in the catalog. But you might want instead to delegate the ability to create schemas to other users, by giving them the CREATE SCHEMA privilege on the catalog.

For detailed instructions, see Create and manage schemas (databases).

(Optional) Keep working with your Hive metastore

If your workspace was in service before it was enabled for Unity Catalog, it likely has a Hive metastore that contains data that you want to continue to use. Databricks recommends that you migrate the tables managed by the Hive metastore to the Unity Catalog metastore, but if you choose not to, you can continue to work with the Hive metastore.

The Hive metastore is represented in Unity Catalog interfaces as a catalog named hive_metastore. In order to continue working with data in your Hive metastore without having to update queries to specify the hive_metastore catalog, you can set the workspace’s default catalog to hive_metastore. See Manage the default catalog.

Depending on when your workspace was enabled for Unity Catalog, the default catalog may already be hive_metastore.

(Optional) Create metastore-level storage

Although Databricks recommends that you create a separate managed storage location for each catalog in your metastore (and you can do the same for schemas), you can opt instead to create a managed location at the metastore level and use it as the default storage for multiple catalogs and schemas.

When an account admin creates a metastore, they have the option to include metastore-level storage or not. You can also add storage to a metastore after it is created.

Metastore-level storage is required only if the following are true:

For more information about the hierarchy of managed storage locations, see Data is physically separated in storage.

To learn how to add metastore-level storage to metastores that have none, see Add managed storage to an existing metastore.

Next steps