Mounting cloud object storage on Databricks
Important
Mounts are a legacy access pattern. Databricks recommends using Unity Catalog for managing all data access. See Connect to cloud object storage using Unity Catalog.
Databricks enables users to mount cloud object storage to the Databricks File System (DBFS) to simplify data access patterns for users that are unfamiliar with cloud concepts. Mounted data does not work with Unity Catalog, and Databricks recommends migrating away from using mounts and instead managing data governance with Unity Catalog.
How does Databricks mount cloud object storage?
Databricks mounts create a link between a workspace and cloud object storage, which enables you to interact with cloud object storage using familiar file paths relative to the Databricks file system. Mounts work by creating a local alias under the /mnt
directory that stores the following information:
Location of the cloud object storage.
Driver specifications to connect to the storage account or container.
Security credentials required to access the data.
What is the syntax for mounting storage?
The source
specifies the URI of the object storage (and can optionally encode security credentials). The mount_point
specifies the local path in the /mnt
directory. Some object storage sources support an optional encryption_type
argument. For some access patterns you can pass additional configuration specifications as a dictionary to extra_configs
.
Note
Databricks recommends setting mount-specific Spark and Hadoop configuration as options using extra_configs
. This ensures that configurations are tied to the mount rather than the cluster or session.
dbutils.fs.mount(
source: str,
mount_point: str,
encryption_type: Optional[str] = "",
extra_configs: Optional[dict[str:str]] = None
)
Check with your workspace and cloud administrators before configuring or altering data mounts, as improper configuration can provide unsecured access to all users in your workspace.
Note
In addition to the approaches described in this article, you can automate mounting a bucket with the Databricks Terraform provider and databricks_mount.
Unmount a mount point
To unmount a mount point, use the following command:
dbutils.fs.unmount("/mnt/<mount-name>")
Warning
To avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run dbutils.fs.refreshMounts()
on all other running clusters to propagate any mount updates. See refreshMounts command (dbutils.fs.refreshMounts).
Access a GCS bucket through DBFS
To work with DBFS mounts, your bucket name must not contain an underscore. To write to a GCS bucket, you must povide a Google Cloud projectId for the bucket.
You must use the service account email address when configuring security for your cluster.
You can mount a bucket to What is DBFS?. The mount is a pointer to a GCS location, so the data is never synced locally.
The follow example shows the basic syntax for mounting a GCS bucket:
bucket_name = "my-gcs-bucket"
mount_name = "my-mount"
dbutils.fs.mount(
f"gs://{bucket_name}",
f"/mnt/databricks/{mount_name}",
extra_configs = {"fs.gs.project.id": "my-project-id"}
)
val bucket_name = "my-gcs-bucket"
val mount_name = "my-mount"
dbutils.fs.mount(
s"gs://${bucket_name}",
s"/mnt/databricks/${mount_name}",
extraConfigs=Map("fs.gs.project.id" -> "my-project-id")
)