Create an external location for data in DBFS root

This article shows how to configure an external location in Unity Catalog to govern access to your DBFS root storage location. Although Databricks recommends against storing data in DBFS root storage, your workspace might do so because of legacy practices.

External locations are Unity Catalog securable objects that associate storage credentials with cloud object storage containers. External locations are used to define managed storage locations for managed tables and volumes, and to govern access to the storage locations that contain external tables and external volumes.

You must create an external location if your workspace-local, legacy Databricks Hive metastore stores data in the DBFS root and you want to federate your legacy Hive metastore so that your team can work with your Hive metastore tables using Unity Catalog. See Hive metastore federation: enable Unity Catalog to govern tables registered in a Hive metastore and Enable Hive metastore federation for a legacy workspace Hive metastore.

Before you begin

In order to create an external location for the DBFS root, you must have a storage credential defined in Unity Catalog that gives access to the DBFS root’s cloud storage location. If you don’t already have one, the system can create one for you during the process of creating the external location.

Permissions requirements:

  • You must have the CREATE STORAGE CREDENTIAL and CREATE EXTERNAL LOCATION privileges on the metastore. Metastore admins have these privileges by default.

    Note

    If a storage credential for the DBFS root’s storage location already exists, then the user who creates the external location does not need CREATE STORAGE CREDENTIAL, but does need CREATE EXTERNAL LOCATION on both the storage credential and the metastore.

  • You must be a workspace admin to have the system create the storage credential for you during external location creation.

    You do not have to be a workspace admin if a storage credential that gives access to the DBFS root storage location already exists and you have CREATE EXTERNAL LOCATION on both the storage credential and the metastore.

Create the external location

You can use Catalog Explorer to create an external location for the DBFS root.

  1. In the sidebar, click Catalog icon Catalog.

  2. Click External data > and Create external location.

  3. Enter an External location name.

  4. Under URL, click Copy from DBFS mount and select Copy from DBFS root.

    The URL and subpath fields are populated with the cloud storage path to the DBFS root.

    Important

    When you create an external location for the DBFS root, you must use the subpath to the DBFS root location, not the path to the entire bucket. The subpath is pre-populated with user/hive/warehouse, which is a default storage location for Hive metastore tables. If you want more fine-grained access control to the data in DBFS root, you can create separate external locations for sub-paths within DBFS root.

  5. Select a storage credential that grants access to the DBFS root cloud storage location or, if none has been defined, click + Create new storage credential.

    To create the storage credential, select a Credential Type of DBFS Root. A storage credential is created automatically when you save the external location.

  6. (Optional) Add a comment.

  7. (Optional) Click Advanced options and enable Fallback mode.

    Fallback mode is intended for legacy workload migration scenarios. See Enable fallback mode on external locations.

  8. Click Create.

  9. Go to the Permissions tab to grant permission to use the external location.

    1. Click Grant.

    2. Select users, groups, or service principals in Principals field, and select the privilege you want to grant.

    3. Click Grant.

  10. (Optional) Set the workspaces that can access this external location.

    By default, users on any workspace that uses this Unity Catalog metastore can be granted access to the data in this location. You can limit that access to specific workspaces. Databricks recommends limiting access to the workspace that the DBFS root is in.

    See Bind an external location to one or more workspaces.