The Databricks data governance model lets you programmatically grant, deny, and revoke access to your data from Spark SQL. This model lets you control access to securable objects like catalogs, databases, tables, views, and functions. It also allows for fine-grained access control (to a particular subset of a table, for example) by setting privileges on derived views created from arbitrary queries. The Databricks SQL query analyzer enforces these access control policies at runtime on Databricks clusters with table access control enabled and all SQL endpoints.
This article describes the privileges, objects, and ownership rules that make up the Databricks data governance model. It also describes how to grant, deny, and revoke object privileges.
The requirements for managing object privileges depends on your environment:
This section describes the Databricks data governance model. Access to securable data objects is governed by privileges.
The securable objects are:
CATALOG: controls access to the entire data catalog.
DATABASE: controls access to a database.
TABLE: controls access to a managed or external table.
VIEW: controls access to SQL views.
FUNCTION: controls access to a named function.
ANONYMOUS FUNCTION: controls access to anonymous or temporary functions.
ANONYMOUS FUNCTIONobjects are not supported in Databricks SQL.
ANY FILE: controls access to the underlying filesystem.
Users granted access to
ANY FILEcan bypass the restrictions put on the catalog, databases, tables, and views by reading from the filesystem directly.
SELECT: gives read access to an object.
CREATE: gives ability to create an object (for example, a table in a database).
MODIFY: gives ability to add, delete, and modify data to or from an object.
USAGE: does not give any abilities, but is an additional requirement to perform any action on a database object.
READ_METADATA: gives ability to view an object and its metadata.
CREATE_NAMED_FUNCTION: gives ability to create a named UDF in an existing catalog or database.
MODIFY_CLASSPATH: gives ability to add files to the Spark class path.
ALL PRIVILEGES: gives all privileges (is translated into all the above privileges).
MODIFY_CLASSPATH privileges are not supported in Databricks SQL.
To perform an action on a database object, a user must have the
USAGE privilege on that database in addition to the privilege to perform that action. Any one of the following satisfy the
- Be an admin
- Have the
USAGEprivilege on the database or be in a group that has the
USAGEprivilege on the database
- Have the
USAGEprivilege on the
CATALOGor be in a group that has the
- Be the owner of the database or be in a group that owns the database
Even the owner of an object inside a database must have the
USAGE privilege in order to use it.
As an example, an administrator could define a
finance group and an
accounting database for them to use.
To set up a database that only the finance team can use and share, an admin would do the following:
CREATE DATABASE accounting; GRANT USAGE ON DATABASE accounting TO finance; GRANT CREATE ON DATABASE accounting TO finance;
With these privileges, members of the
finance group can create tables and views in the
but can’t share those tables or views with any principal that does not have
USAGE on the
When table access control is enabled on the workspace and on all clusters, SQL objects in Databricks are hierarchical and privileges are inherited downward. This means that granting or denying a privilege on the
CATALOG automatically grants or denies the privilege to all databases in the catalog. Similarly, privileges granted on a
DATABASE object are inherited by all objects in that database. This pattern is true for all securable objects.
If you deny a user privileges on a table, the user can’t see the table by attempting to list all tables in the database. If you deny a user privileges on a database, the user can’t see that the database exists by attempting to list all databases in the catalog.
When table access control is enabled on a cluster or SQL endpoint, a user who creates a database, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users.
Groups may own objects, in which case all members of that group are considered owners.
Ownership determines whether or not you can grant privileges on derived objects to other users.
For example, suppose user A owns table T and grants user B
SELECT privilege on table T. Even
though user B can select from table T, user B cannot grant
SELECT privilege on table T to user C,
because user A is still the owner of the underlying table T. Furthermore, user B cannot circumvent
this restriction simply by creating a view V on table T and granting privileges on that view to
user C. When Databricks checks for privileges for user C to access view V, it also checks that
the owner of V and underlying table T are the same. If the owners are not the same, user C must
SELECT privileges on underlying table T.
When table access control is disabled on a cluster, no owner is registered when a database, table, view,
or function is created. To test if an object has an owner, run
SHOW GRANT ON <object-name>.
If you do not see an entry with
ActionType OWN, the object does not have an owner.
Either the owner of an object or an administrator can transfer ownership of an object using the
ALTER <object> OWNER TO `<user-name>@<user-domain>.com` command:
ALTER DATABASE <database-name> OWNER TO `<user-name>@<user-domain>.com` ALTER TABLE <table-name> OWNER TO `group_name` ALTER VIEW <view-name> OWNER TO `<user-name>@<user-domain>.com`
Administrators and owners can grant privileges to users and groups.
Each user is uniquely identified by their username in Databricks (which typically maps to their email address).
All users are implicitly a part of the “All Users” group, represented as
users in SQL.
You must enclose user specifications in backticks (
` ` ), not single quotes (
In Databricks, admin users can manage all object privileges, effectively have all privileges granted on all securables, and can change the owner of any object. Owners of an object can perform any action on that object, can grant privileges on that object to other principals, and can transfer ownership of the object to another principal. The only limit to an owner’s privileges is for objects within a database; to interact with an object in a database the user must also have
USAGE on that database.
The following table maps SQL operations to the privileges required to perform that operation.
- Any place where a privilege on a table, view, or function is required,
USAGEis also required on the database it’s in.
- In any place where a table is referenced in a command, a path could also be referenced. In those instances
MODIFYis required on
ANY FILEinstead of
USAGEon the database and another privilege on the table.
- Object ownership is represented here as the
|CREATE BLOOMFILTER INDEX||
|CREATE FUNCTION (External)||Either
|CREATE FUNCTION (SQL)||Either
|DROP BLOOMFILTER INDEX||
|FSCK REPAIR TABLE||
When you use table access control,
DROP TABLE statements are case sensitive. If a table name is lower case and the
DROP TABLE references the table name using mixed or upper case, the
DROP TABLE statement will fail.
You use the
SHOW GRANT operations to manage object privileges.
An owner or an administrator of an object can perform
SHOW GRANToperations. However, an administrator cannot deny privileges to or revoke privileges from an owner.
A principal that’s not an owner or administrator can perform an operation only if the required privilege has been granted.
To grant, deny, or revoke a privilege for all users, specify the keyword
TO. For example,
GRANT SELECT ON ANY FILE TO users
GRANT SELECT ON DATABASE <database-name> TO `<user>@<domain-name>` GRANT SELECT ON ANONYMOUS FUNCTION TO `<user>@<domain-name>` GRANT SELECT ON ANY FILE TO `<user>@<domain-name>` SHOW GRANT `<user>@<domain-name>` ON DATABASE <database-name> DENY SELECT ON <table-name> TO `<user>@<domain-name>` REVOKE ALL PRIVILEGES ON DATABASE default FROM `<user>@<domain-name>` REVOKE SELECT ON <table-name> FROM `<user>@<domain-name>` GRANT SELECT ON ANY FILE TO users
Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.
current_user(): return the current user name.
is_member(): determine if the current user is a member of a specific Databricks group.
Available in Databricks Runtime 7.3 LTS and above. However, to use these functions in Databricks Runtime 7.3 LTS, you must set the Spark config
Consider the following example, which combines both functions to determine if a user has the appropriate group membership:
-- Return: true if the user is a member and false if they are not SELECT current_user as user, -- Check to see if the current user is a member of the "Managers" group. is_member("Managers") as admin
Allowing administrators to set fine granularity privileges for multiple users and groups within a single view is both expressive and powerful, while saving on administration overhead.
Through dynamic views it’s easy to limit what columns a specific group or user can see. Consider the following example where only users who belong to the
auditors group are able to see email addresses from the
sales_raw table. At analysis time Spark replaces the
CASE statement with either the literal
'REDACTED' or the column
-- Alias the field 'email' to itself (as 'email') to prevent the -- permission logic from showing up directly in the column name results. CREATE VIEW sales_redacted AS SELECT user_id, CASE WHEN is_member('auditors') THEN email ELSE 'REDACTED' END AS email, country, product, total FROM sales_raw
Using dynamic views you can specify permissions down to the row or field level. Consider the following example, where only users who belong to the
managers group are able to see transaction amounts (
total column) greater than $1,000,000.00:
CREATE VIEW sales_redacted AS SELECT user_id, country, product, total FROM sales_raw WHERE CASE WHEN is_member('managers') THEN TRUE ELSE total <= 1000000 END;
As shown in the preceding examples, you can implement column-level masking to prevent users from seeing specific column data unless they are in the correct group. Because these views are standard Spark SQL, you can do more advanced types of masking with more complex SQL expressions. The following example lets all users perform analysis on email domains, but lets members of the
auditors group see users’ full email addresses.
-- The regexp_extract function takes an email address such as -- email@example.com and extracts 'example', allowing -- analysts to query the domain name CREATE VIEW sales_redacted AS SELECT user_id, region, CASE WHEN is_member('auditors') THEN email ELSE regexp_extract(email, '^.*@(.*)$', 1) END FROM sales_raw
Specify the keyword
FROM. For example:
GRANT SELECT ON TABLE database.table TO users
This error can occur because you created that object on a cluster or SQL endpoint without table access control enabled. When table access control is disabled on a cluster or SQL endpoint, owners are not registered when a database, table, or view is created. An admin must assign an owner to the object using the following command:
ALTER [DATABASE | TABLE | VIEW] <object-name> OWNER TO `<user-name>@<user-domain>.com`;
Privileges on global and local temporary views are not supported. Local temporary
views are visible only within the same session, and views created in the
global_temp database are
visible to all users sharing a cluster or SQL endpoint. However, privileges on the underlying tables and views
referenced by any temporary views are enforced.
A grant, deny, or revoke statement can be applied to only one object at a time. The recommended way
to organize and grant privileges on multiple tables to a principal is via databases. Granting a
SELECT privilege on a database implicitly grants that principal
SELECT privileges on
all tables and views in that database. For example, if a database D has tables t1 and t2, and an
admin issues the following
GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>`
<user>@<domain-name> can select from tables t1 and t2, as well as any tables and views created in database D in the future.
SELECT privilege to the database and then deny
SELECT privilege for the specific table you want to restrict access to.
GRANT USAGE, SELECT ON DATABASE D TO `<user>@<domain-name>` DENY SELECT ON TABLE D.T TO `<user>@<domain-name>`
<user>@<domain-name> can select from all tables in D except D.T.
A user has
SELECT privileges on a view of table T, but when that user tries to
SELECT from that view, they get the error
User does not have privilege SELECT on table.
This common error can occur for one of the following reasons:
- Table T has no registered owner because it was created using a cluster or SQL endpoint for which table access control is disabled.
- The grantor of the
SELECTprivilege on a view of table T is not the owner of table T or the user does not also have select
SELECTprivilege on table T.
Suppose there is a table T owned by A. A owns view V1 on T and B owns view V2 on T.
- A user can select on V1 when A has granted
SELECTprivileges on view V1.
- A user can select on V2 when A has granted
SELECTprivileges on table T and B has granted
SELECTprivileges on V2.
As described in the Object ownership section, these conditions ensure that only the owner of an object can grant other users access to that object.
On clusters with table access control enabled you can use only the Spark SQL and Python DataFrame APIs. The RDD API is disallowed for security reasons, since Databricks does not have the ability to inspect and authorize code within an RDD.