Best practices for data governance

This article covers best practices of data governance, organized by architectural principles listed in the following sections.

1. Unify data management

Manage metadata for all data assets in one place

As a best practice, run the lakehouse in a single account with one Unity Catalog. The top-level container of objects in Unity Catalog is a metastore. It stores data assets (such as tables and views) and the permissions that govern access to them. Use a single metastore per cloud region and do not access metastores across regions to avoid latency issues.

The metastore provides a three-level namespace:

Databricks recommends using catalogs to provide segregation across your organization’s information architecture. Often this means that catalogs can correspond to software development environment scope, team, or business unit.

2. Unify data security

Centralize access control

The Databricks Data Intelligence Platform provides methods for data access control, mechanisms that describe which groups or individuals can access what data. These are statements of policy that can be extremely granular and specific, right down to definitions of every record that each individual has access to. Or they can be very expressive and broad, such as all finance users can see all financial data.

Unity Catalog centralizes access controls for files, tables, and views. Each securable object in Unity Catalog has an owner. An object’s owner has all privileges on the object, as well as the permission to grant privileges on the securable object to other principals. Unity Catalog allows to manage privileges, and to configure access control by using SQL DDL statements.

Unity Catalog uses dynamic views for fine-grained access controls so that you can restrict access to rows and columns to the users and groups who are authorized to query them. See Create a dynamic view.

For further information see Security, compliance & privacy - Manage identity and access using least privilege.

Configure audit logging

Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns. There are two types of logs: Workspace-level audit logs with workspace-level events and account-level audit logs with account-level events.

Audit Unity Catalog events

Unity Catalog captures an audit log of actions performed against the metastore. This enables admins to access fine-grained details about who accessed a given dataset and what actions they performed.

Audit data sharing events

For secure sharing with Delta Sharing, Databricks provides audit logs to monitor Delta Sharing events, including:

  • When someone creates, modifies, updates, or deletes a share or a recipient.

  • When a recipient accesses an activation link and downloads the credential.

  • When a recipient accesses shares or data in shared tables.

  • When a recipient’s credential is rotated or expires.

3. Manage data quality

The Databricks Data Intelligence Platform provides robust data quality management with built-in quality controls, testing, monitoring, and enforcement to ensure accurate and useful data is available for downstream BI, analytics, and machine learning workloads.

See Reliability - Manage data quality.

4. Share data securely and in real-time

Use the open Delta Sharing protocol for sharing data with partners

Delta Sharing provides an open solution for securely sharing live data from your lakehouse to any computing platform. Recipients do not need to be on the Databricks platform, on the same cloud, or on any cloud at all. Delta Sharing is natively integrated with Unity Catalog, enabling organizations to centrally manage and audit shared data across the enterprise and confidently share data assets while meeting security and compliance requirements.

Data providers can share live data from where it resides in their cloud storage without replicating or moving it to another system. This approach reduces the operational costs of data sharing because data providers don’t have to replicate data multiple times across clouds, geographies, or data platforms to each of their data consumers.

Use Databricks-to-Databricks Delta Sharing between Databricks users

If you want to share data with users who don’t have access to your Unity Catalog metastore, you can use Databricks-to-Databricks Delta Sharing, as long as the recipients have access to a Databricks workspace that is enabled for Unity Catalog. Databricks-to-Databricks sharing lets you share data with users in other Databricks accounts, across cloud regions, across cloud providers. It’s a great way to securely share data across different Unity Catalog metastores in your own Databricks account.