Allowlist libraries and init scripts on shared compute

Preview

This feature is in Public Preview.

In Databricks Runtime 13.3 LTS and above, you can add libraries and init scripts to the allowlist in Unity Catalog. This allows users to leverage these artifacts on compute configured with shared access mode.

You can allowlist a directory or filepath before that directory or file exists. See Upload files to a Unity Catalog volume.

Note

You must be a metastore admin or have the MANAGE ALLOWLIST privilege to modify the allowlist. See MANAGE ALLOWLIST.

How to add items to the allowlist

You can add items to the allowlist with Catalog Explorer or the REST API.

To open the dialog for adding items to the allowlist in Catalog Explorer, do the following:

  1. In your Databricks workspace, click Catalog icon Catalog.

  2. Click Gear icon to open the metastore details and permissions UI.

  3. Select Allowed JARs/Init Scripts.

  4. Click Add.

Important

This option only displays for sufficiently privileged users. If you cannot access the allowlist UI, contact your metastore admin for assistance in allowlisting libraries and init scripts.

Add an init script to the allowlist

Complete the following steps in the allowlist dialog to add an init script to the allowlist:

  1. For Type, select Init Script.

  2. For Source Type, select Volume or the object storage protocol.

  3. Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.

Add a JAR to the allowlist

Complete the following steps in the allowlist dialog to add a JAR to the allowlist:

  1. For Type, select JAR.

  2. For Source Type, select Volume or the object storage protocol.

  3. Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.

Add Maven coordinates to the allowlist

Complete the following steps in the allowlist dialog to add Maven coordinates to the allowlist:

  1. For Type, select Maven.

  2. For Source Type, select Coordinates.

  3. Enter coordinates in the following format: groudId:artifactId:version.

    • You can include all versions of a library by allowlisting the following format: groudId:artifactId.

    • You can include all artifacts in a group by allowlisting the following format: groupId.

How are permissions on paths enforced in the allowlist?

You can use the allowlist to grant access to JARs or init scripts stored in Unity Catalog volumes and object storage. If you add a path for a directory rather than a file, allowlist permissions propagate to contained files and directories.

Prefix matching is used for all artifacts stored in Unity Catalog volumes or object storage. To prevent prefix matching at a given directory level, include a trailing slash (/). For example: /Volumes/prod-libraries/.

You can define permissions at the following levels:

  1. The base path for the volume or storage container.

  2. A directory nested at any depth from the base path.

  3. A single file.

Adding a path to the allowlist only means that the path can be used for either init scripts or JAR installation. Databricks still checks for permissions to access data in the specified location.

The principal used must have READ VOLUME permissions on the specified volume. See READ VOLUME.

In single user access mode, the identity of the assigned principal (a user or service principal) is used.

In shared access mode or no-isolation shared access mode, the identity of the library installer is used.

Note

No-isolation shared access mode does not support volumes, but uses the same identity assignment as shared access mode.

Databricks recommends configuring all object storage privileges related to init scripts and libraries with read-only permissions. Users with write permissions on these locations can potentially modify code in library files or init scripts.

Databricks recommends using Google Cloud service accounts to manage access to JARs or init scripts stored in GCS. Create a Google Cloud service account with the Storage Object Viewer role for your desired bucket and attach it to a cluster. See Access GCS buckets using Google Cloud service accounts on clusters.

Note

Allowlist permissions for JARs and init scripts are managed separately. If you use the same location to store both types of objects, you must add the location to the allowlist for each.