Create a Unity Catalog metastore

This article shows how to create a Unity Catalog metastore and link it to workspaces.

A metastore is the top-level container for data in Unity Catalog. Unity Catalog metastores register metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them. Each metastore exposes a three-level namespace (catalog.schema.table) by which data can be organized. You must have one metastore for each region in which your organization operates. To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.

To create a metastore, you do the following:

  1. In your GCP account, optionally create a storage location for metastore-level storage of managed tables and volumes.

    For information to help you decide whether you need metastore-level storage, see (Optional) Create metastore-level storage and Data is physically separated in storage.

  2. In Databricks, create the metastore. Databricks generates a service account.

  3. Give the service account access to your GCS bucket and assign workspaces to the metastore.

Note

In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.

Before you begin

Before you begin, you should familiarize yourself with the basic Unity Catalog concepts, including metastores and managed storage. See What is Unity Catalog?.

You should also confirm that you meet the following requirements for all setup steps:

  • You must be a Databricks account admin.

  • Your Databricks account must be on the Premium plan.

  • If you want to set up metastore-level root storage, you must have permission to create GCS buckets and assign permissions to those GCS buckets in your Google Cloud account.

Step 1 (Optional): Create the GCS bucket

In this step, which is optional, you create a GCS bucket to store managed table and volume data at the metastore level. To determine whether you need metastore-level storage, see (Optional) Create metastore-level storage.

  1. Configure a GCS bucket in Google Cloud.

    The storage bucket is where data for managed tables will be stored for this metastore. All managed tables will be stored in this bucket unless you override the storage location at the catalog or schema levels.

    When you create the bucket:

    • Create it in the same region as the workspaces you will to use to access the data.

    • Use a dedicated GCS bucket for each metastore that you create.

    • Do not allow direct user access to the bucket.

  2. Make a note of the bucket path (gs://bucket-name).

Step 2: Create the metastore and optionally generate a service account

To create a metastore:

  1. Log in to the Databricks account console.

  2. Click Catalog icon Catalog.

  3. Click Create Metastore.

  4. Enter the following:

    • A name for the metastore.

    • The region where you want to deploy the metastore.

      This must be in the same region as the workspaces you want to use to access the data. Make sure that this matches the region of the GCS bucket you created earlier.

    • (Optional) The path to the GCS bucket that you created in the previous task.

  5. Click Create.

    If you provided a path to a GCS bucket in the previous step, the Provide Storage Access dialog appears. It displays the system-generated Service Account Name and asks you to grant that service account two IAM roles for the GCS bucket. Keep this dialog open when you proceed to the next task. This task is required only if you want to enable metastore-level storage.

    If you did not provide a path to a GCS bucket, you are prompted to assign workspaces to the metastore.

Step 3 (Optional): Give the service account access to your GCS bucket

In this step, which is required only if you completed step 1, give the system-generated service account access to your storage bucket:

  1. In another browser tab or window, go to the Google Cloud console and open the GCS bucket that you provided in the previous step.

  2. On the Permission tab, click + Grant access and assign the service account the following roles:

    • Storage Legacy Bucket Reader

    • Storage Object Admin

    Use the service account’s email address as the principal identifier.

  3. Return to the Provide Storage Access dialog in the Databricks account console and click Permissions granted.

    Databricks validates that the service account has the correct access to the bucket.

  4. When the validation is successful, you can select workspaces to assign to the metastore.

    To learn how to assign workspaces to metastores, see Enable a workspace for Unity Catalog.

Step 4: Assign workspaces to the metastore

As part of Step 2: Create the metastore and optionally generate a service account, you are prompted to assign workspaces to the metastore. If you skipped that step or need to add more workspaces, do the following:

  1. As an account admin, log in to the account console.

  2. Click Catalog icon Catalog.

  3. Click the metastore name.

  4. Click the Workspaces tab.

  5. Click Assign to workspaces.

  6. Select one or more workspaces. You can type part of the workspace name to filter the list.

  7. Click Assign.

  8. On the confirmation dialog, click Enable.

Step 5: Transfer the metastore admin role to a group

The user who creates a metastore is its owner, also called the metastore admin. The metastore admin can create top-level objects in the metastore such as catalogs and can manage access to tables and other objects. Databricks recommends that you reassign the metastore admin role to a group. See Assign a metastore admin.

Add managed storage to an existing metastore

Metastore-level managed storage is optional, and you may have metastores that have none assigned. You might want to add metastore-level storage to your metastore if you prefer a data isolation model that stores data centrally for multiple workspaces. You need metastore-level storage if you want to share notebooks using Delta Sharing or if you are a Databricks partner who uses personal staging locations.

See also Managed storage.

Requirements

  • You must have at least one workspace attached to the Unity Catalog metastore.

  • Databricks permissions required:

    • To create an external location, you must be a metastore admin or user with the CREATE EXTERNAL LOCATION and CREATE STORAGE CREDENTIAL privileges.

    • To add the storage location to the metastore definition, you must be an account admin.

  • GCP permissions required: the ability to create GCS buckets and service accounts.

Step 1: Create the storage location

Follow the instructions in Step 1 (Optional): Create the GCS bucket to create a dedicated GCS bucket in a Google Cloud account in the same region as your metastore.

Step 2: Create an external location in Unity Catalog

In this step, you create an external location in Unity Catalog that references the GCS bucket path that you just created.

  1. Create a storage credential.

    As part of storage credential creation, a Google Cloud service account is created for you, and you give that service account access to the GCS bucket that you created in Step 1: Create the storage location.

    Follow the instructions in Create a storage credential for connecting to Google Cloud Storage.

  2. Create an external location that references the storage credential that you created in the previous step and the GCS bucket that you created in Step 1: Create the storage location.

    Follow the instructions in Create an external location to connect cloud storage to Databricks

  3. Grant yourself the CREATE MANAGED STORAGE privilege on the external location.

    1. Click the external location name to open the details pane.

    2. On the Permissions tab, click Grant.

    3. On the Grant on <external location> dialog, select yourself in the Principals field and select CREATE MANAGED STORAGE.

    4. Click Grant.

Step 3: Add the storage location to the metastore

After you have created an external location that represents the metastore storage bucket, you can add it to the metastore.

  1. As an account admin, log in to the account console.

  2. Click Catalog icon Catalog.

  3. Click the metastore name.

  4. Confirm that you are the Metastore Admin.

    If you are not, click Edit and assign yourself as the metastore admin. You can unassign yourself when you are done with this procedure.

  5. On the Configuration tab, next to GCS bucket path, click Set.

  6. On the Set metastore root dialog, enter the GCS bucket path that you used to create the external location, and click Update.

    You cannot modify this path once you set it.

Delete a metastore

If you are closing your Databricks account or have another reason to delete access to data managed by your Unity Catalog metastore, you can delete the metastore.

Warning

All objects managed by the metastore will become inaccessible using Databricks workspaces. This action cannot be undone.

Managed table data and metadata will be auto-deleted after 30 days. External table data in your cloud storage is not affected by metastore deletion.

To delete a metastore:

  1. As a metastore admin, log in to the account console.

  2. Click Catalog icon Catalog.

  3. Click the metastore name.

  4. On the Configuration tab, click the three-button menu at the far upper right and select Delete.

  5. On the confirmation dialog, enter the name of the metastore and click Delete.