Identity best practices

This article provides an opinionated perspective on how to best configure identity in Databricks. It includes a guide on how to migrate to identity federation, which enables you to manage all of your users, groups, and service principals in the Databricks account.

For an overview of the Databricks identity model, see Databricks identities.

For information on how to securely access Databricks APIs, see Secure API authentication.

Configure users, service principals, and groups

There are three types of Databricks identity:

  • Users: User identities recognized by Databricks and represented by email addresses.

  • Service principals: Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.

  • Groups: Groups simplify identity management, making it easier to assign access to workspaces, data, and other securable objects.

Databricks recommends creating service principals to run production jobs or modify production data. If all processes that act on production data run using service principals, interactive users do not need any write, delete, or modify privileges in production. This eliminates the risk of a user overwriting production data by accident.

It is best practice to assign access to workspaces and access-control policies in Unity Catalog to groups, instead of to users individually. All Databricks identities can be assigned as members of groups, and members inherit permissions that are assigned to their group.

The following are the administrative roles that can manage Databricks identities:

  • Account admins can add users, service principals, and groups to the account and assign them admin roles. They can give users access to workspaces, as long as those workspaces use identity federation.

  • Workspace admins can add users, service principals to the Databricks account. They can also add groups to the Databricks account if their workspaces are enabled for identity federation. Workspace admins can grant users, service principals, and groups access to their workspaces.

  • Group managers can manage group membership. They can also assign other users the group manager role.

  • Service principal managers can manage roles on a service principal.

Databricks recommends that there be a limited number of account admins per account and workspace admins in each workspace.

Sync users and groups from your identity provider to your Databricks account

Databricks recommends using SCIM provisioning to sync users and groups automatically from your identity provider to your Databricks account. SCIM streamlines onboarding a new employee or team by using your identity provider to create users and groups in Databricks and give them the proper level of access. When a user leaves your organization or no longer needs access to Databricks, admins can terminate the user in your identity provider and that user’s account will also be removed from Databricks. This ensures a consistent offboarding process and prevents unauthorized users from accessing sensitive data.

You should aim to synchronize all of the users and groups that intend to use Databricks to the account console rather than individual workspaces. This way you only need to configure one SCIM provisioning application to keep all identities consistent across all workspaces in the account.

Important

If you already have SCIM connectors that sync identities directly to your workspaces, you must disable those SCIM connectors when the account-level SCIM connector is enabled. See Upgrade to identity federation.

Account-level SCIM diagram

If you have under 10,000 users in your identity provider, Databricks recommends assigning a group in your identity provider that contains all of the users to the account-level SCIM application. Specific users, groups and service principals can then be assigned from the account to specific workspaces within Databricks using identity federation.

Enable identity federation

Identity federation enables you to configure users, service principals, and groups in the account console, and then assign those identities access to specific workspaces. This simplifies Databricks administration and data governance.

With identity federation, you configure Databricks users, service principals, and groups once in the account console, rather than repeating configuration separately in each workspace. This both reduces friction in onboarding a new team to Databricks and enables you to maintain one SCIM provisioning application with your identity provider to the Databricks account, instead of a separate SCIM provisioning application for each workspace. Once users, service principals, and groups are added to the account, you can assign them permissions on workspaces. You can only assign account-level identities access to workspaces that are enabled for identity federation.

Account-level identity diagram

To enable a workspace for identity federation, see How do admins enable identity federation on a workspace?. When the assignment is complete, identity federation is marked as Enabled on the workspace’s Configuration tab in the account console.

Identity federation is enabled on the workspace-level, and you can have a combination of identity federated and non-identity federated workspaces. For those workspaces that are not enabled for identity federation, workspace admins manage their workspace users, service principals, and groups entirely within the scope of the workspace (the legacy model). They cannot use the account console or account-level APIs to assign users from the account to these workspaces, but they can use any of the workspace-level interfaces. Whenever a new user or service principal is added to a workspace using workspace-level interfaces, that user or service principal is synchronized to the account-level. This enables you to have one consistent set of users and service principals in your account.

However, when a group is added to a non-identity federated workspace using workspace-level interfaces, that group is a workspace-local group and is not added to the account. You should aim to use account groups rather than workspace-local groups. Workspace-local groups cannot be granted access-control policies in Unity Catalog or permissions to other workspaces.

Upgrade to identity federation

If you are enabling identity federation on an existing workspace, do the following:

  1. Migrate workspace-level SCIM provisioning to the account level

    If you have a workspace-level SCIM provisioning set up your workspace, you should set up account-level SCIM provisioning and turn off the workspace-level SCIM provisioner. Workspace-level SCIM will continue to create and update workspace-local groups. Databricks recommends using account groups instead of workspace-local groups to take advantage of centralized workspace assignment and data access management using Unity Catalog. Workspace-level SCIM also does not recognize account groups that are assigned to your identity federated workspace and workspace-level SCIM API calls will fail if they involve account groups. For more information about how to disable workspace-level SCIM, see Migrate workspace-level SCIM provisioning to the account level.

  2. Convert workspace-local groups to account groups

    Databricks recommends converting your existing workspace-local groups to account groups. See Migrate workspace-local groups to account groups for instructions.

Assign groups workspace permissions

Now that identity federation is enabled on your workspace, you can assign the users, service principals, and groups in your account permissions on that workspace. Databricks recommends that you assign groups permissions to workspaces instead of assigning workspace permissions to users individually. All Databricks identities can be assigned as members of groups, and members inherit permissions that are assigned to their group.

Add workspace permissions

Learn more