Install libraries from object storage

This article walks you through the steps required to install libraries from cloud object storage on Databricks.

Note

This article refers to cloud object storage as a general concept, and assumes that you are directly interacting with data stored in object storage using URIs. Databricks recommends using Unity Catalog volumes to configure access to files in cloud object storage. See Create and work with volumes.

You can store custom JAR and Python Whl libraries in cloud object storage, instead of storing them in the DBFS root. See Cluster-scoped libraries for full library compatibility details.

Important

Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in a Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.

Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.

Load libraries to object storage

You can load libraries to object storage the same way you load other files. You must have proper permissions in your cloud provider to create new object storage containers or load files into cloud object storage.

Grant read-only permissions to object storage

Databricks recommends configuring all privileges related to library installation with read-only permissions.

Databricks allows you to assign security permissions to individual clusters that govern access to data in cloud object storage. These policies can be expanded to add read-only access to cloud object storage that contains libraries.

Note

In Databricks Runtime 12.2 LTS and below, you cannot load JAR libraries when using clusters with shared access modes. In Databricks Runtime 13.3 LTS and above, you must add JAR libraries to the Unity Catalog allowlist. See Allowlist libraries and init scripts on shared compute.

Databricks recommends using Google Cloud service accounts to manage access to libraries stored in GCS. Create a Google Cloud service account with the Storage Object Viewer role for your desired bucket and attach it to a cluster. See Access GCS buckets using Google Cloud service accounts on clusters.

Install libraries to clusters

To install a library stored in cloud object storage to a cluster, complete the following steps:

  1. Select a cluster from the list in the clusters UI.

  2. Select the Libraries tab.

  3. Select the File path/GCS option.

  4. Provide the full URI path to the library object (for example, gs://bucket-name/path/to/library.whl).

  5. Click Install.

You can also install libraries using the REST API or CLI.

Install libraries to notebooks

You can use %pip to install custom Python wheel files stored in object storage scoped to a notebook-isolated SparkSession. To use this method, you must either store libraries in publicly readable object storage or use a pre-signed URL.

See Notebook-scoped Python libraries.

Note

JAR libraries cannot be installed in the notebook. You must install JAR libraries at the cluster level.