Create clusters & SQL warehouses with Unity Catalog access

This article shows how to create a Databricks cluster or SQL warehouse that can access data in Unity Catalog.

SQL warehouses are used to run Databricks SQL workloads, such as queries, dashboards, and visualizations. SQL warehouses allow you to access Unity Catalog data and run Unity Catalog-specific commands by default, as long as your workspace is attached to a Unity Catalog metastore.

Clusters are used to run workloads in the Data Science & Engineering and Databricks Machine Learning persona-based environments, using notebooks or automated jobs. To create a cluster that can access Unity Catalog, the workspace you are creating the cluster in must be attached to a Unity Catalog metastore and must use a Unity-Catalog-capable access mode (shared or single user).

You can work with data in Unity Catalog using either of these compute resources, depending on the environment you are using: SQL warehouses for Databricks SQL, or clusters for the Data Science & Engineering and Databricks Machine Learning environments.

Note

For information about clusters configuration UI changes that are available in preview, see Create a cluster.

What is cluster access mode?

When you create any cluster in Databricks, you must select an access mode that is specific to the type of workload that you want to use the cluster for. Unity Catalog enforces security using specific cluster access modes. If a cluster is not configured with one of the Unity-Catalog-capable access modes (shared or single user), the cluster can’t access data in Unity Catalog.

The following table lists all available access modes:

Access Mode

Visible to user

UC Support

Supported Languages

Notes

Single User

Always

Yes

Python, SQL, Scala, R

Can be assigned to and used by a single user. To read from a view, you must have SELECT on all referenced tables and views. Dynamic views are not supported.

Shared

Always (Premium plan required)

Yes

Python (on Databricks Runtime 11.3 LTS and above), SQL

Can be used by multiple users with data isolation among users. See shared limitations.

No Isolation Shared

Admins can hide this cluster type by enforcing user isolation in the admin settings page.

No

Python, SQL, Scala, R

There is a related account-level setting for No Isolation Shared clusters.

Custom

Hidden (For all new clusters)

No

Python, SQL, Scala, R

This option is shown only if you have existing clusters without a specified access mode.

You can upgrade an existing cluster to meet the requirements of Unity Catalog by setting its cluster access mode to Single User or Shared. There are additional access mode limitations for Structured Streaming on Unity Catalog, see Structured Streaming support.

Important

Access mode in the Clusters API is not supported.

Shared access mode limitations

  • Init scripts are not supported.

  • Cluster libraries are not supported on Databricks Runtime 13.0 and below.

  • Cluster-scoped Python libraries are supported on Databricks Runtime 13.1 and above. Support is also available for Python wheels that are uploaded as workspace files, but not libraries that are referenced using DBFS filepaths, including libraries uploaded to DBFS root. Non-Python libraries are not supported. See Cluster libraries.

  • Spark-submit jobs are not supported.

  • Databricks Runtime ML is not supported.

  • Cannot use Scala, R, RDD APIs, or clients that directly read the data from cloud storage, such as DBUtils.

  • Cannot use user-defined functions (UDFs), including UDAFs, UDTFs, Pandas on Spark (applyInPandas and mapInPandas), and Hive UDFs.

  • Must run their commands on cluster nodes as a low-privilege user forbidden from accessing sensitive parts of the filesystem or creating network connections to ports other than 80 and 443.

Attempts to get around these restrictions will fail. These restrictions are in place so that users can’t access unprivileged data through the cluster.

Note

  • For many use cases, alternative features can be used instead of init scripts to configure your cluster.

  • If your workloads require init scripts, cluster libraries, JARs, or user-defined functions, you might be eligible to use those features in a private preview. To learn more about the terms and conditions of the private preview and request access, sign up here.

Requirements

Create a cluster that can access Unity Catalog

A cluster is designed for running workloads such as notebooks and automated jobs.

To create a cluster that can access Unity Catalog, the workspace must be attached to a Unity Catalog metastore.

Databricks Runtime requirements

Unity Catalog requires clusters that run Databricks Runtime 11.3 LTS or above.

Steps

To create a cluster:

  1. On the sidebar, use the persona switcher to select either Data Science and Engineering or Machine Learning.

  2. On the sidebar, click New > Cluster.

  3. Choose the access mode you want to use.

    Create UC cluster

    For clusters that run on standard Databricks Runtime versions, select either Single user or Shared access mode to connect to Unity Catalog. If you use Databricks Runtime for Machine Learning, you must select Single user access mode to connect to Unity Catalog. See What is cluster access mode?

  4. Select a Databricks Runtime version of 11.3 LTS or above.

  5. Complete your cluster configuration and click Create Cluster.

When the cluster is available, it will be able to run workloads that use Unity Catalog.

Create a SQL warehouse that can access Unity Catalog

A SQL warehouse is required to run workloads in Databricks SQL, such as queries, dashboards, and visualizations. By default all SQL Warehouses can connect to Unity Catalog. See Configure SQL warehouses for specific configuration options.