Create a storage credential for connecting to Google Cloud Storage
This article describes how to create a storage credential in Unity Catalog to connect to Google Cloud Storage.
To manage access to the underlying cloud storage that holds tables and volumes, Unity Catalog uses the following object types:
Storage credentials encapsulate a long-term cloud credential that provides access to cloud storage.
External locations contain a reference to a storage credential and a cloud storage path.
For more information, see Connect to cloud object storage using Unity Catalog.
Unity Catalog supports two cloud storage options for Databricks on Google Cloud: Google Cloud Storage (GCS) buckets and Cloudflare R2 buckets. Cloudflare R2 is intended primarily for Delta Sharing use cases in which you want to avoid data egress fees. GCS is appropriate for most other use cases. This article focuses on creating storage credentials for GCS. For Cloudflare R2, see Create a storage credential for connecting to Cloudflare R2.
To create a storage credential for access to a GCS bucket, you give Unity Catalog the ability to read and write to the bucket by assigning IAM roles on that bucket to a Databricks-generated Google Cloud service account.
Requirements
In Databricks:
Databricks workspace enabled for Unity Catalog.
CREATE STORAGE CREDENTIAL
privilege on the Unity Catalog metastore attached to the workspace. Account admins and metastore admins have this privilege by default.
In your Google Cloud account:
A GCS bucket in the same region as the workspaces you want to access the data from.
Permission to modify the access policy for that bucket.
Generate a Google Cloud service account using Catalog Explorer
Log in to your Unity Catalog-enabled Databricks workspace as a user who has the
CREATE STORAGE CREDENTIAL
privilege on the metastore.The metastore admin and account admin roles both include this privilege.
In the sidebar, click
Catalog.
At the top of the Catalog pane, click the
Add icon and select Add a storage credential from the menu.
This option does not appear if you don’t have the
CREATE STORAGE CREDENTIAL
privilege.Alternatively, from the Quick access page, click the External data > button, go to the Storage Credentials tab, and select Create credential.
On the Create a new storage credential dialog, elect a Credential Type of Google Cloud Storage.
Enter a Storage credential name and an optional comment.
(Optional) If you want users to have read-only access to the external locations that use this storage credential, select Read only. For more information, see Mark a storage credential as read-only.
Click Save.
Databricks creates the storage credential and generates a Google Cloud service account.
On the Storage credential created dialog, make a note of the service account ID, which is in the form of an email address, and click Done.
(Optional) Bind the storage credential to specific workspaces.
By default, any privileged user can use the storage credential on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign a storage credential to specific workspaces.
Configure permissions for the service account
Go to the Google Cloud console and open the GCS bucket that you want to access from your Databricks workspace.
The bucket should be in the same region as your Databricks workspace.
On the Permission tab, click + Grant access and assign the service account the following roles:
Storage Legacy Bucket Reader
Storage Object Admin
Use the service account’s email address as the principal identifier.
You can now create an external location that references this storage credential.
(Optional) Assign a storage credential to specific workspaces
Preview
This feature is in Public Preview.
By default, a storage credential is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as CREATE EXTERNAL LOCATION
) on that storage credential, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you may want to allow access to a storage credential only from specific workspaces. This feature is known as workspace binding or storage credential isolation.
A typical use case for binding a storage credential to specific workspaces is the scenario in which a cloud admin configures a storage credential using a production cloud account credential, and you want to ensure that Databricks users use this credential to create external locations only in the production workspace.
For more information about workspace binding, see (Optional) Assign an external location to specific workspaces and Limit catalog access to specific workspaces.
Note
Workspace bindings are referenced when privileges against storage credentials are exercised. For example, if a user creates an external location using a storage credential, the workspace binding on the storage credential is checked only when the external location is created. After the external location is created, it will function independently of the workspace bindings configured on the storage credential.
Bind a storage credential to one or more workspaces
To assign a storage credential to specific workspaces, you can use Catalog Explorer or the Databricks CLI.
Permissions required: Metastore admin or storage credential owner.
Note
Metastore admins can see all storage credentials in a metastore using Catalog Explorer—and storage credential owners can see all storage credentials that they own in a metastore—regardless of whether the storage credential is assigned to the current workspace. Storage credentials that are not assigned to the workspace appear grayed out.
Log in to a workspace that is linked to the metastore.
In the sidebar, click
Catalog.
At the top of the Catalog pane, click the
gear icon and select Storage Credentials.
Alternatively, from the Quick access page, click the External data > button and go to the Storage Credentials tab.
Select the storage credential and go to the Workspaces tab.
On the Workspaces tab, clear the All workspaces have access checkbox.
If your storage credential is already bound to one or more workspaces, this checkbox is already cleared.
Click Assign to workspaces and enter or find the workspaces you want to assign.
To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.
There are two Databricks CLI command groups and two steps required to assign a storage credential to a workspace.
In the following examples, replace <profile-name>
with the name of your Databricks authentication configuration profile. It should include the value of a personal access token, in addition to the workspace instance name and workspace ID of the workspace where you generated the personal access token. See Databricks personal access token authentication.
Use the
storage-credentials
command group’supdate
command to set the storage credential’sisolation mode
toISOLATED
:databricks storage-credentials update <my-storage-credential> \ --isolation-mode ISOLATED \ --profile <profile-name>
The default
isolation-mode
isOPEN
to all workspaces attached to the metastore.Use the
workspace-bindings
command group’supdate-bindings
command to assign the workspaces to the storage credential:databricks workspace-bindings update-bindings storage-credential <my-storage-credential> \ --json '{ "add": [{"workspace_id": <workspace-id>}...], "remove": [{"workspace_id": <workspace-id>}...] }' --profile <profile-name>
Use the
"add"
and"remove"
properties to add or remove workspace bindings.Note
Read-only binding (
BINDING_TYPE_READ_ONLY
) is not available for storage credentials. Therefore there is no reason to setbinding_type
for the storage credentials binding.
To list all workspace assignments for a storage credential, use the workspace-bindings
command group’s get-bindings
command:
databricks workspace-bindings get-bindings storage-credential <my-storage-credential> \
--profile <profile-name>
Unbind a storage credential from a workspace
Instructions for revoking workspace access to a storage credential using Catalog Explorer or the workspace-bindings
CLI command group are included in Bind a storage credential to one or more workspaces.
Next steps
You can view, update, delete, and grant other users permission to use storage credentials. See Manage storage credentials.
You can define external locations using storage credentials. See Create a storage credential for connecting to Google Cloud Storage.