Create and work with volumes

This article introduces volumes, which are Unity Catalog objects that enable governance over non-tabular datasets. It also describes how to create, manage, and work with volumes.

For details on uploading and managing files in volumes, see Upload files to a Unity Catalog volume and File management operations for Unity Catalog volumes.

Note

When you work with volumes, you must use a SQL warehouse or a cluster running Databricks Runtime 13.3 LTS or above, unless you are using Databricks UIs such as Catalog Explorer.

What are Unity Catalog volumes?

Volumes are Unity Catalog objects that represent a logical volume of storage in a cloud object storage location. Volumes provide capabilities for accessing, storing, governing, and organizing files. While tables provide governance over tabular datasets, volumes add governance over non-tabular datasets. You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data.

Important

You cannot use volumes as a location for tables. Volumes are intended for path-based data access only. Use tables for storing tabular data with Unity Catalog.

What is a managed volume?

A managed volume is a Unity Catalog-governed storage volume created within the default storage location of the containing schema. Managed volumes allow the creation of governed storage for working with files without the overhead of external locations and storage credentials. You do not need to specify a location when creating a managed volume, and all file access for data in managed volumes is through paths managed by Unity Catalog. See What path is used for accessing files in a volume?.

When you delete a managed volume, the files stored in this volume are also deleted from your cloud tenant within 30 days.

What is an external volume?

An external volume is a Unity Catalog-governed storage volume registered against a directory within an external location using Unity Catalog-governed storage credentials. External volumes allow you to add Unity Catalog data governance to existing cloud object storage directories. Some use cases for external volumes include the following:

  • Adding governance to data files without migration.

  • Governing files produced by other systems that must be ingested or accessed by Databricks.

  • Governing data produced by Databricks that must be accessed directly from cloud object storage by other systems.

External volumes must be directories within external locations governed by Unity Catalog storage credentials. Unity Catalog does not manage the lifecycle or layout of the files in external volumes. When you drop an external volume, Unity Catalog does not delete the underlying data.

Note

When you define a volume, cloud URI access to data under the volume path is governed by the permissions of the volume.

What path is used for accessing files in a volume?

The path to access volumes is the same whether you use Apache Spark, SQL, Python, or other languages and libraries. This differs from legacy access patterns for files in object storage bound to a Databricks workspace.

The path to access files in volumes uses the following format:

/Volumes/<catalog>/<schema>/<volume>/<path>/<file-name>

Databricks also supports an optional dbfs:/ scheme when working with Apache Spark, so the following path also works:

dbfs:/Volumes/<catalog>/<schema>/<volume>/<path>/<file-name>

The sequence /<catalog>/<schema>/<volume> in the path corresponds to the three Unity Catalog object names associated with the file. These path elements are read-only and not directly writeable by users, meaning it is not possible to create or delete these directories using filesystem operations. They are automatically managed and kept in sync with the corresponding UC entities.

Note

You can also access data in external volumes using cloud storage URIs.

What are the privileges for volumes?

Volumes use the same basic privilege model as tables, but where privileges for tables focus on granting access to querying and manipulating rows in a table, privileges for volumes focus on working with files. As such, volumes introduce the following privileges:

See Unity Catalog privileges and securable objects.

Who can manage volume privileges?

You must have owner privileges on a volume to manage volume privileges or drop volumes. Each object in Unity Catalog can only have one principal assigned as an owner, and while ownership does not cascade (that is, the owner of a catalog does not automatically become the owner of all objects in that catalog), the privileges associated with ownership apply to all objects contained within an object.

This means that for Unity Catalog volumes, the following principals can manage volume privileges:

  • The owner of the parent catalog.

  • The owner of the parent schema.

  • The owner of the volume.

While each object can only have a single owner, Databricks recommends assigning ownership for most objects to a group rather than an individual user. Initial ownership for any object is assigned to the user that creates that object. See Manage Unity Catalog object ownership.

Create a managed volume

You must have the following permissions to create a managed volume:

Resource

Permissions required

Schema

USE SCHEMA, CREATE VOLUME

Catalog

USE CATALOG

To create a managed volume, use the following syntax:

CREATE VOLUME <catalog>.<schema>.<volume-name>;

To create a managed volume in Catalog Explorer:

  1. In your Databricks workspace, click Catalog icon Catalog.

  2. Search or browse for the schema that you want to add the volume to and select it.

  3. Click the Create Volume button. (You must have sufficient privileges.)

  4. Enter a name for the volume.

  5. Provide a comment (optional).

  6. Click Create.

Create an external volume

You must have the following permissions to create an external volume:

Resource

Permissions required

External location

CREATE EXTERNAL VOLUME

Schema

USE SCHEMA, CREATE VOLUME

Catalog

USE CATALOG

To create an external volume, specify a path within an external location using the following syntax:

CREATE EXTERNAL VOLUME <catalog>.<schema>.<external-volume-name>
LOCATION 'gcs://<external-location-bucket-path>/<directory>';

To create an external volume in Catalog Explorer:

  1. In your Databricks workspace, click Catalog icon Catalog.

  2. Search or browse for the schema that you want to add the volume to and select it.

  3. Click the Create Volume button. (You must have sufficient privileges.)

  4. Enter a name for the volume.

  5. Choose an external location in which to create the volume.

  6. Edit the path to reflect the sub-directory where you want to create the volume.

  7. Provide a comment (optional).

  8. Click Create.

Drop a volume

Only users with owner privileges can drop a volume. See Who can manage volume privileges?.

Use the following syntax to drop a volume:

DROP VOLUME IF EXISTS <volume-name>;

When you drop a managed volume, Databricks deletes the underlying data within 30 days. When you drop an external volume, you remove the volume from Unity Catalog but the underlying data remains unchanged in the external location.

Read files in a volume

You must have the following permissions to view the contents of a volume or access files that are stored on volumes:

Resource

Permissions required

Volume

READ

Schema

USE SCHEMA

Catalog

USE CATALOG

You interact with the contents of volumes using paths. See What path is used for accessing files in a volume?.

Create, delete, and perform other file management operations on a volume

You must have the following permissions to perform file management operations on files that are stored on volumes:

Resource

Permissions required

Volume

READ, WRITE

Schema

USE SCHEMA

Catalog

USE CATALOG

You can perform file management operations on volumes with the following tools:

For full details on programmatically interacting with files on volumes, see Work with files in Unity Catalog volumes.

Example notebook: Create and work with volumes

The following notebook demonstrates the basic SQL syntax to create and interact with Unity Catalog volumes.

Tutorial: Unity Catalog volumes notebook

Open notebook in new tab

Reserved paths for volumes

Volumes introduces the following reserved paths used for accessing volumes:

  • dbfs:/Volumes

  • /Volumes

Note

Paths are also reserved for potential typos for these paths from Apache Spark APIs and dbutils, including /volumes, /Volume, /volume, whether or not they are preceded by dbfs:/. The path /dbfs/Volumes is also reserved, but cannot be used to access volumes.

Volumes are only supported on Databricks Runtime 13.3 LTS and above. In Databricks Runtime 12.2 LTS and below, operations against /Volumes paths might succeed, but can write data to ephemeral storage disks attached to compute clusters rather than persisting data to Unity Catalog volumes as expected.

Important

If you have pre-existing data stored in a reserved path on the DBFS root, you can file a support ticket to gain temporary access to this data to move it to another location.

Limitations

You must use Unity Catalog-enabled compute to interact with Unity Catalog volumes. Volumes do not support all workloads.

Note

Volumes do not support dbutils.fs commands distributed to executors.

The following limitations apply:

In Databricks Runtime 14.3 LTS and above:

  • On single-user user clusters, you cannot access volumes from threads and subprocesses in Scala.

In Databricks Runtime 14.2 and below:

  • On compute configured with shared access mode, you can’t use UDFs to access volumes.

    • Both Python or Scala have access to FUSE from the driver but not from executors.

    • Scala code that performs I/O operations can run on the driver but not the executors.

  • On compute configured with single user access mode, there is no support for FUSE in Scala, Scala IO code accessing data using volume paths, or Scala UDFs. Python UDFs are supported in single user access mode.

On all supported Databricks Runtime versions:

  • Unity Catalog UDFs do not support accessing volume file paths.

  • You cannot access volumes from RDDs.

  • You cannot use spark-submit with JARs stored in a volume.

  • You cannot define dependencies to other libraries accessed via volume paths inside a Wheel or JAR file.

  • You cannot list Unity Catalog objects using the /Volumes/<catalog-name> or /Volumes/<catalog-name>/<schema-name> patterns. You must use a fully-qualified path that includes a volume name.

  • The DBFS endpoint for the REST API does not support volumes paths.

  • Volumes are excluded from global search results in the Databricks workspace.

  • You cannot specify volumes as the destination for cluster log delivery.

  • %sh mv is not supported for moving files between volumes. Use dbutils.fs.mv or %sh cp instead.

  • You cannot create a custom Hadoop file system with volumes, meaning the following is not supported:

    import org.apache.hadoop.fs.Path
    val path =  new Path("dbfs:/Volumes/main/default/test-volume/file.txt")
    val fs = path.getFileSystem(sc.hadoopConfiguration)
    fs.listStatus(path)