Create a Unity Catalog metastore
This article shows how to create a Unity Catalog metastore and link it to workspaces.
Important
For workspaces that were enabled for Unity Catalog automatically, the instructions in this article are unnecessary. Databricks began to enable new workspaces for Unity Catalog automatically on March 6, 2024, with a rollout proceeding gradually across accounts. You must follow the instructions in this article only if you have a workspace and don’t already have a metastore in your workspace region. To determine whether a metastore already exists in your region, see Automatic enablement of Unity Catalog.
A metastore is the top-level container for data in Unity Catalog. Unity Catalog metastores register metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them. Each metastore exposes a three-level namespace (catalog
.schema
.table
) by which data can be organized. You must have one metastore for each region in which your organization operates. To work with Unity Catalog, users must be on a workspace that is attached to a metastore in their region.
To create a metastore, you do the following:
In your GCP account, optionally create a storage location for metastore-level storage of managed tables and volumes.
For information to help you decide whether you need metastore-level storage, see (Optional) Create metastore-level storage and Data is physically separated in storage.
In Databricks, create the metastore. Databricks generates a service account.
Give the service account access to your GCS bucket and assign workspaces to the metastore.
Note
In addition to the approaches described in this article, you can also create a metastore by using the Databricks Terraform provider, specifically the databricks_metastore resource. To enable Unity Catalog to access the metastore, use databricks_metastore_data_access. To link workspaces to a metastore, use databricks_metastore_assignment.
Before you begin
Before you begin, you should familiarize yourself with the basic Unity Catalog concepts, including metastores and managed storage. See What is Unity Catalog?.
You should also confirm that you meet the following requirements for all setup steps:
You must be a Databricks account admin.
Your Databricks account must be on the Premium plan.
If you want to set up metastore-level root storage, you must have permission to create GCS buckets and assign permissions to those GCS buckets in your Google Cloud account.
Step 1 (Optional): Create the GCS bucket
In this step, which is optional, you create a GCS bucket to store managed table and volume data at the metastore level. To determine whether you need metastore-level storage, see (Optional) Create metastore-level storage.
Configure a GCS bucket in Google Cloud.
The storage bucket is where data for managed tables will be stored for this metastore. All managed tables will be stored in this bucket unless you override the storage location at the catalog or schema levels.
When you create the bucket:
Create it in the same region as the workspaces you will to use to access the data.
Use a dedicated GCS bucket for each metastore that you create.
Do not allow direct user access to the bucket.
Make a note of the bucket path (
gs://bucket-name
).
Step 2: Create the metastore and optionally generate a service account
To create a metastore:
Log in to the Databricks account console.
Click Catalog.
Click Create metastore.
Enter the following:
A name for the metastore.
The region where you want to deploy the metastore.
This must be in the same region as the workspaces you want to use to access the data. Make sure that this matches the region of the GCS bucket you created earlier.
(Optional) The path to the GCS bucket that you created in the previous task.
Click Create.
If you provided a path to a GCS bucket in the previous step, the Provide Storage Access dialog appears. It displays the system-generated Service Account Name and asks you to grant that service account two IAM roles for the GCS bucket. Keep this dialog open when you proceed to the next task. This task is required only if you want to enable metastore-level storage.
If you did not provide a path to a GCS bucket, you are prompted to assign workspaces to the metastore. See Step 4: Assign workspaces to the metastore or Enable a workspace for Unity Catalog.
Step 3 (Optional): Give the service account access to your GCS bucket
In this step, which is required only if you completed step 1, give the system-generated service account access to your storage bucket:
In another browser tab or window, go to the Google Cloud console and open the GCS bucket that you provided in the previous step.
On the Permission tab, click + Grant access and assign the service account the following roles:
Storage Legacy Bucket Reader
Storage Object Admin
Use the service account’s email address as the principal identifier.
Return to the Provide Storage Access dialog in the Databricks account console and click Permissions granted.
Databricks validates that the service account has the correct access to the bucket.
When the validation is successful, you can select workspaces to assign to the metastore.
To learn how to assign workspaces to metastores, see the section that follows or Enable a workspace for Unity Catalog.
Step 4: Assign workspaces to the metastore
As part of Step 2: Create the metastore and optionally generate a service account, you are prompted to assign workspaces to the metastore. If you skipped that step or need to add more workspaces, do the following:
As an account admin, log in to the account console.
Click Catalog.
Click the metastore name.
Click the Workspaces tab.
Click Assign to workspaces.
Select one or more workspaces. You can type part of the workspace name to filter the list.
Scroll to the bottom of the dialog and click Assign.
On the confirmation dialog, click Enable.
Step 5: Transfer the metastore admin role to a group
The user who creates a metastore is its owner, also called the metastore admin. The metastore admin can create top-level objects in the metastore such as catalogs and can manage access to tables and other objects. Databricks recommends that you reassign the metastore admin role to a group. See Assign a metastore admin.