How do paths work for data managed by Unity Catalog?
This article explains restrictions around path overlaps in Unity Catalog, details path-based access patterns for data files in Unity Catalog objects, and describes how Unity Catalog manages paths for tables and volumes.
Note
Volumes are only supported on Databricks Runtime 13.3 LTS and above. In Databricks Runtime 12.2 LTS and below, operations against /Volumes
paths might succeed, but they can only write data to ephemeral storage disks attached to compute clusters rather than persisting data to Unity Catalog volumes as expected.
Paths for Unity Catalog objects cannot overlap
Unity Catalog enforces data governance by preventing managed directories of data from overlapping. Unity Catalog enforces the following rules:
External locations cannot overlap other external locations.
Tables and volumes store data files in external locations or the metastore root location.
Volumes cannot overlap other volumes.
Tables cannot overlap other tables.
Tables and volumes cannot overlap each other.
Managed storage locations cannot overlap each other. See Specify a managed storage location in Unity Catalog.
External volumes cannot overlap managed storage locations.
External tables cannot overlap managed storage locations.
These rules mean that the following restrictions exist in Unity Catalog:
You cannot define an external location within another external location.
You cannot define a volume within another volume.
You cannot define a table within another table.
You cannot define a table on any data files or directories within a volume.
You cannot define a volume on a directory within a table.
Note
You can always use path-based access to write or read data files from volumes, including Delta Lake. You cannot register these data files as tables in the Unity Catalog metastore.
Paths for managed tables and managed volumes are fully-managed by Unity Catalog
When you create a managed table or a managed volume, Unity Catalog creates a new directory in the Unity Catalog-configured storage location associated with the containing schema. The name of this directory is randomly generated to avoid any potential collision with other directories already present.
This behavior differs from how Hive metastore creates managed tables. Databricks recommends always interacting with Unity Catalog managed tables using table names and Unity Catalog managed volumes using volume paths.
Paths for external tables and external volumes are governed by Unity Catalog
When you create an external table or an external volume, you specify a path within an external location governed by Unity Catalog.
Important
Databricks recommends never creating an external volume or external table at the root of an external location. Instead, create external volumes and external tables in sub-directories within an external location. These recommendations should help avoid accidentally overlapping paths. See Paths for Unity Catalog objects cannot overlap.
For ease of use, Databricks recommends interacting with Unity Catalog external tables using table names and Unity Catalog external volumes using volume paths.
Alternately, users with sufficient privileges on the corresponding Unity Catalog object can access data from an external table or external volume using the fully-qualified cloud object storage path.
Important
Unity Catalog manages all privileges for access using cloud URIs to data associated with external tables or external volumes. These privileges override any privileges associated with external locations. See Unity Catalog privileges and securable objects
How can you access data in Unity Catalog?
Unity Catalog objects provide access to data through object identifiers, volume paths, or cloud URIs. You can use these values to access data associated with volumes and tables.
Unity Catalog tables are accessed using a three-tier identifier with the following pattern:
<catalog_name>.<schema_name>.<table_name>
What are volume file paths in Unity Catalog?
Volumes provide a file path to access data files with the following pattern:
/Volumes/<catalog_name>/<schema_name>/<volume_name>/<path_to_file>
Cloud URIs require users to provide the driver, storage container identifier, and full path to the target files, as in the following example:
gs://<bucket_name>/<path>
The following table shows the access methods allowed for Unity Catalog objects:
Object |
Object identifier |
File path |
Cloud URI |
---|---|---|---|
External location |
no |
no |
yes |
Managed table |
yes |
no |
no |
External table |
yes |
no |
yes |
Managed volume |
no |
yes |
no |
External volume |
no |
yes |
yes |
Note
Unity Catalog volumes use three-tier object identifiers with the following pattern for management commands (such as CREATE VOLUME
and DROP VOLUME
):
<catalog_name>.<schema_name>.<volume_name>
To actually work with files in volumes, you must use path-based access.