Mounting cloud object storage on Databricks

Databricks enables users to mount cloud object storage to the Databricks File System (DBFS) to simplify data access patterns for users that are unfamiliar with cloud concepts. Mounted data does not work with Unity Catalog, and Databricks recommends migrating away from using mounts and managing data governance with Unity Catalog.

How does Databricks mount cloud object storage?

Databricks mounts create a link between a workspace and cloud object storage, which enables you to interact with cloud object storage using familiar file paths relative to the Databricks file system. Mounts work by creating a local alias under the /mnt directory that stores the following information:

  • Location of the cloud object storage.

  • Driver specifications to connect to the storage account or container.

  • Security credentials required to access the data.

What is the syntax for mounting storage?

The source specifies the URI of the object storage (and can optionally encode security credentials). The mount_point specifies the local path in the /mnt directory. Some object storage sources support an optional encryption_type argument. For some access patterns you can pass additional configuration specifications as a dictionary to extra_configs.

Note

Databricks recommends setting mount-specific Spark and Hadoop configuration as options using extra_configs. This ensures that configurations are tied to the mount rather than the cluster or session.

dbutils.fs.mount(
  source: str,
  mount_point: str,
  encryption_type: Optional[str] = "",
  extra_configs: Optional[dict[str:str]] = None
)

Check with your workspace and cloud administrators before configuring or altering data mounts, as improper configuration can provide unsecured access to all users in your workspace.

Note

In addition to the approaches described in this article, you can automate mounting a bucket with the Databricks Terraform provider and databricks_mount.

Unmount a mount point

To unmount a mount point, use the following command:

dbutils.fs.unmount("/mnt/<mount-name>")

Warning

To avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run dbutils.fs.refreshMounts() on all other running clusters to propagate any mount updates. See refreshMounts command (dbutils.fs.refreshMounts).

Access a GCS bucket through DBFS

To work with DBFS mounts, your bucket name must not contain an underscore. To write to a GCS bucket, you must povide a Google Cloud projectId for the bucket.

You must use the service account email address when configuring security for your cluster.

You can mount a bucket to What is the Databricks File System (DBFS)?. The mount is a pointer to a GCS location, so the data is never synced locally.

The follow example shows the basic syntax for mounting a GCS bucket:

bucket_name = "my-gcs-bucket"
mount_name = "my-mount"
dbutils.fs.mount(
  f"gs://{bucket_name}",
  f"/mnt/databricks/{mount_name}",
  extra_configs = {"fs.gs.project.id": "my-project-id"}
)
val bucket_name = "my-gcs-bucket"
val mount_name = "my-mount"
dbutils.fs.mount(
  s"gs://${bucket_name}",
  s"/mnt/databricks/${mount_name}",
  extraConfigs=Map("fs.gs.project.id" -> "my-project-id")
)