Identity best practices
This article provides an opinionated perspective on how to best configure identity in Databricks. It includes a guide on how to migrate to identity federation, which enables you to manage all of your users, groups, and service principals in the Databricks account.
For an overview of the Databricks identity model, see Databricks identities.
For information on how to securely access Databricks APIs, see Manage personal access token permissions.
Configure users, service principals, and groups
There are three types of Databricks identity:
Users: User identities recognized by Databricks and represented by email addresses.
Service principals: Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
Groups: Groups simplify identity management, making it easier to assign access to workspaces, data, and other securable objects.
Databricks recommends creating service principals to run production jobs or modify production data. If all processes that act on production data run using service principals, interactive users do not need any write, delete, or modify privileges in production. This eliminates the risk of a user overwriting production data by accident.
It is best practice to assign access to workspaces and access-control policies in Unity Catalog to groups, instead of to users individually. All Databricks identities can be assigned as members of groups, and members inherit permissions that are assigned to their group.
The following are the administrative roles that can manage Databricks identities:
Account admins can add users, service principals, and groups to the account and assign them admin roles. They can give users access to workspaces, as long as those workspaces use identity federation.
Workspace admins can add users, service principals to the Databricks account. They can also add groups to the Databricks account if their workspaces are enabled for identity federation. Workspace admins can grant users, service principals, and groups access to their workspaces.
Group managers can manage group membership. They can also assign other users the group manager role.
Service principal managers can manage roles on a service principal.
Databricks recommends that there be a limited number of account admins per account and workspace admins in each workspace.
Sync users and groups from your identity provider to your Databricks account
Databricks recommends using SCIM provisioning to sync users and groups automatically from your identity provider to your Databricks account. SCIM streamlines onboarding a new employee or team by using your identity provider to create users and groups in Databricks and give them the proper level of access. When a user leaves your organization or no longer needs access to Databricks, admins can remove the user from your identity provider and that user is deactivated in Databricks. This ensures a consistent offboarding process and prevents unauthorized users from accessing sensitive data.
You should aim to synchronize all of the users and groups in your identity provider to the account console rather than individual workspaces. This way, you only need to configure one SCIM provisioning application to keep all identities consistent across all workspaces in the account. See Enable all identity provider users to access Databricks.
Important
If you already have SCIM connectors that sync identities directly to your workspaces, you must disable those SCIM connectors when the account-level SCIM connector is enabled. See Upgrade to identity federation.
If you have under 10,000 users in your identity provider, Databricks recommends assigning a group in your identity provider that contains all of the users to the account-level SCIM application. Specific users, groups and service principals can then be assigned from the account to specific workspaces within Databricks using identity federation.
Enable identity federation
Identity federation enables you to configure users, service principals, and groups in the account console, and then assign those identities access to specific workspaces. This simplifies Databricks administration and data governance.
Important
Databricks began to enable new workspaces for identity federation and Unity Catalog automatically on March 6, 2024, with a rollout proceeding gradually across accounts. If your workspace is enabled for identity federation by default, it cannot be disabled. For more information, see Automatic enablement of Unity Catalog.
With identity federation, you configure Databricks users, service principals, and groups once in the account console, rather than repeating configuration separately in each workspace. This both reduces friction in onboarding a new team to Databricks and enables you to maintain one SCIM provisioning application with your identity provider to the Databricks account, instead of a separate SCIM provisioning application for each workspace. Once users, service principals, and groups are added to the account, you can assign them permissions on workspaces. You can only assign account-level identities access to workspaces that are enabled for identity federation.
To enable a workspace for identity federation, see How do admins enable identity federation on a workspace?. When the assignment is complete, identity federation is marked as Enabled on the workspace’s Configuration tab in the account console.
Identity federation is enabled on the workspace-level, and you can have a combination of identity federated and non-identity federated workspaces. For those workspaces that are not enabled for identity federation, workspace admins manage their workspace users, service principals, and groups entirely within the scope of the workspace (the legacy model). They cannot use the account console or account-level APIs to assign users from the account to these workspaces, but they can use any of the workspace-level interfaces. Whenever a new user or service principal is added to a workspace using workspace-level interfaces, that user or service principal is synchronized to the account-level. This enables you to have one consistent set of users and service principals in your account.
However, when a group is added to a non-identity federated workspace using workspace-level interfaces, that group is a workspace-local group and is not added to the account. You should aim to use account groups rather than workspace-local groups. Workspace-local groups cannot be granted access-control policies in Unity Catalog or permissions to other workspaces.
Upgrade to identity federation
If you are enabling identity federation on an existing workspace, do the following:
Migrate workspace-level SCIM provisioning to the account level
If you have a workspace-level SCIM provisioning set up your workspace, you should set up account-level SCIM provisioning and turn off the workspace-level SCIM provisioner. Workspace-level SCIM will continue to create and update workspace-local groups. Databricks recommends using account groups instead of workspace-local groups to take advantage of centralized workspace assignment and data access management using Unity Catalog. Workspace-level SCIM also does not recognize account groups that are assigned to your identity federated workspace and workspace-level SCIM API calls will fail if they involve account groups. For more information about how to disable workspace-level SCIM, see Migrate workspace-level SCIM provisioning to the account level.
Convert workspace-local groups to account groups
Databricks recommends converting your existing workspace-local groups to account groups. See Migrate workspace-local groups to account groups for instructions.
Assign groups workspace permissions
Now that identity federation is enabled on your workspace, you can assign the users, service principals, and groups in your account permissions on that workspace. Databricks recommends that you assign groups permissions to workspaces instead of assigning workspace permissions to users individually. All Databricks identities can be assigned as members of groups, and members inherit permissions that are assigned to their group.
Learn more
Manage users, service principals, and groups, learn more about the Databricks identity model.
Sync users and groups from your identity provider, get started using SCIM provisioning.
Unity Catalog best practices, learn how to best configure Unity Catalog.