Create a storage credential for connecting to Google Cloud Storage

This article describes how to create a storage credential in Unity Catalog to connect to Google Cloud Storage.

To manage access to the underlying cloud storage that holds tables and volumes, Unity Catalog uses the following object types:

  • Storage credentials encapsulate a long-term cloud credential that provides access to cloud storage.

  • External locations contain a reference to a storage credential and a cloud storage path.

For more information, see Connect to cloud object storage using Unity Catalog.

Unity Catalog supports two cloud storage options for Databricks on Google Cloud: Google Cloud Storage (GCS) buckets and Cloudflare R2 buckets. Cloudflare R2 is intended primarily for Delta Sharing use cases in which you want to avoid data egress fees. GCS is appropriate for most other use cases. This article focuses on creating storage credentials for GCS. For Cloudflare R2, see Create a storage credential for connecting to Cloudflare R2.

To create a storage credential for access to a GCS bucket, you give Unity Catalog the ability to read and write to the bucket by assigning IAM roles on that bucket to a Databricks-generated Google Cloud service account.

Requirements

In Databricks:

  • Databricks workspace enabled for Unity Catalog.

  • CREATE STORAGE CREDENTIAL privilege on the Unity Catalog metastore attached to the workspace. Account admins and metastore admins have this privilege by default.

In your Google Cloud account:

  • A GCS bucket in the same region as the workspaces you want to access the data from.

  • Permission to modify the access policy for that bucket.

Generate a Google Cloud service account using Catalog Explorer

  1. Log in to your Unity Catalog-enabled Databricks workspace as a user who has the CREATE STORAGE CREDENTIAL privilege on the metastore.

    The metastore admin and account admin roles both include this privilege.

  2. In the sidebar, click Catalog icon Catalog.

  3. At the top of the Catalog pane, click the Add or plus icon Add icon and select Add a storage credential from the menu.

    This option does not appear if you don’t have the CREATE STORAGE CREDENTIAL privilege.

    Alternatively, from the Quick access page, click the External data > button, go to the Storage Credentials tab, and select Create credential.

  4. On the Create a new storage credential dialog, elect a Credential Type of Google Cloud Storage.

  5. Enter a Storage credential name and an optional comment.

  6. (Optional) If you want users to have read-only access to the external locations that use this storage credential, select Read only. For more information, see Mark a storage credential as read-only.

  7. Click Save.

    Databricks creates the storage credential and generates a Google Cloud service account.

  8. On the Storage credential created dialog, make a note of the service account ID, which is in the form of an email address, and click Done.

  9. (Optional) Bind the storage credential to specific workspaces.

    By default, any privileged user can use the storage credential on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign a storage credential to specific workspaces.

Configure permissions for the service account

  1. Go to the Google Cloud console and open the GCS bucket that you want to access from your Databricks workspace.

    The bucket should be in the same region as your Databricks workspace.

  2. On the Permission tab, click + Grant access and assign the service account the following roles:

    • Storage Legacy Bucket Reader

    • Storage Object Admin

    Use the service account’s email address as the principal identifier.

You can now create an external location that references this storage credential.

(Optional) Assign a storage credential to specific workspaces

Preview

This feature is in Public Preview.

By default, a storage credential is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as CREATE EXTERNAL LOCATION) on that storage credential, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you may want to allow access to a storage credential only from specific workspaces. This feature is known as workspace binding or storage credential isolation.

A typical use case for binding a storage credential to specific workspaces is the scenario in which a cloud admin configures a storage credential using a production cloud account credential, and you want to ensure that Databricks users use this credential to create external locations only in the production workspace.

For more information about workspace binding, see (Optional) Assign an external location to specific workspaces and Limit catalog access to specific workspaces.

Note

Workspace bindings are referenced when privileges against storage credentials are exercised. For example, if a user creates an external location using a storage credential, the workspace binding on the storage credential is checked only when the external location is created. After the external location is created, it will function independently of the workspace bindings configured on the storage credential.

Bind a storage credential to one or more workspaces

To assign a storage credential to specific workspaces, you can use Catalog Explorer or the Databricks CLI.

Permissions required: Metastore admin or storage credential owner.

Note

Metastore admins can see all storage credentials in a metastore using Catalog Explorer—and storage credential owners can see all storage credentials that they own in a metastore—regardless of whether the storage credential is assigned to the current workspace. Storage credentials that are not assigned to the workspace appear grayed out.

  1. Log in to a workspace that is linked to the metastore.

  2. In the sidebar, click Catalog icon Catalog.

  3. At the top of the Catalog pane, click the Gear icon gear icon and select Storage Credentials.

    Alternatively, from the Quick access page, click the External data > button and go to the Storage Credentials tab.

  4. Select the storage credential and go to the Workspaces tab.

  5. On the Workspaces tab, clear the All workspaces have access checkbox.

    If your storage credential is already bound to one or more workspaces, this checkbox is already cleared.

  6. Click Assign to workspaces and enter or find the workspaces you want to assign.

To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.

There are two Databricks CLI command groups and two steps required to assign a storage credential to a workspace.

In the following examples, replace <profile-name> with the name of your Databricks authentication configuration profile. It should include the value of a personal access token, in addition to the workspace instance name and workspace ID of the workspace where you generated the personal access token. See Databricks personal access token authentication.

  1. Use the storage-credentials command group’s update command to set the storage credential’s isolation mode to ISOLATED:

    databricks storage-credentials update <my-storage-credential> \
    --isolation-mode ISOLATED \
    --profile <profile-name>
    

    The default isolation-mode is OPEN to all workspaces attached to the metastore.

  2. Use the workspace-bindings command group’s update-bindings command to assign the workspaces to the storage credential:

    databricks workspace-bindings update-bindings storage-credential <my-storage-credential> \
    --json '{
      "add": [{"workspace_id": <workspace-id>}...],
      "remove": [{"workspace_id": <workspace-id>}...]
    }' --profile <profile-name>
    

    Use the "add" and "remove" properties to add or remove workspace bindings.

    Note

    Read-only binding (BINDING_TYPE_READ_ONLY) is not available for storage credentials. Therefore there is no reason to set binding_type for the storage credentials binding.

To list all workspace assignments for a storage credential, use the workspace-bindings command group’s get-bindings command:

databricks workspace-bindings get-bindings storage-credential <my-storage-credential> \
--profile <profile-name>

Unbind a storage credential from a workspace

Instructions for revoking workspace access to a storage credential using Catalog Explorer or the workspace-bindings CLI command group are included in Bind a storage credential to one or more workspaces.

Next steps

You can view, update, delete, and grant other users permission to use storage credentials. See Manage storage credentials.

You can define external locations using storage credentials. See Create a storage credential for connecting to Google Cloud Storage.