Best practices for data governance
This article covers best practices of data governance, organized by architectural principles listed in the following sections.
1. Unify data management
Manage metadata for all data assets in one place
As a best practice, run the lakehouse in a single account with one Unity Catalog. The top-level container of objects in Unity Catalog is a metastore. It stores data assets (such as tables and views) and the permissions that govern access to them. Use a single metastore per cloud region and do not access metastores across regions to avoid latency issues.
The metastore provides a three-level namespace:
Databricks recommends using catalogs to provide segregation across your organization’s information architecture. Often this means that catalogs can correspond to software development environment scope, team, or business unit.
2. Unify data security
Centralize access control
The Databricks Data Intelligence Platform provides methods for data access control, mechanisms that describe which groups or individuals can access what data. These are statements of policy that can be extremely granular and specific, right down to definitions of every record that each individual has access to. Or they can be very expressive and broad, such as all finance users can see all financial data.
Unity Catalog centralizes access controls for files, tables, and views. Each securable object in Unity Catalog has an owner. An object’s owner has all privileges on the object, as well as the permission to grant privileges on the securable object to other principals. Unity Catalog allows to manage privileges, and to configure access control by using SQL DDL statements.
Unity Catalog uses dynamic views for fine-grained access controls so that you can restrict access to rows and columns to the users and groups who are authorized to query them. See Create a dynamic view.
For further information see Security, compliance & privacy - Manage identity and access using least privilege.
Configure audit logging
Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns. There are two types of logs: Workspace-level audit logs with workspace-level events and account-level audit logs with account-level events.
Audit Unity Catalog events
Unity Catalog captures an audit log of actions performed against the metastore. This enables admins to access fine-grained details about who accessed a given dataset and what actions they performed.
Audit data sharing events
For secure sharing with Delta Sharing, Databricks provides audit logs to monitor Delta Sharing events, including:
When someone creates, modifies, updates, or deletes a share or a recipient.
When a recipient accesses an activation link and downloads the credential.
When a recipient accesses shares or data in shared tables.
When a recipient’s credential is rotated or expires.