Cluster libraries
Cluster libraries can be used by all notebooks and jobs running on a cluster. This article details using the Install library UI in the Databricks workspace.
You can install libraries to a cluster using the following approaches:
Install a library for use with a specific cluster only.
Install a library with the REST API. See the Libraries API.
Install a library with Databricks CLI. See What is the Databricks CLI?.
Install a library using Terraform. See Databricks Terraform provider and databricks_library.
Install a library using an init script that runs at cluster creation time. See Install a library with an init script.
Install a library on a cluster
To install a library on a cluster:
Click Compute in the sidebar.
Click a cluster name.
Click the Libraries tab.
Click Install New.
The Install library dialog displays.
Select one of the Library Source options, complete the instructions that appear, and then click Install.
Important
Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in a Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.
Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.
Not all cluster access modes support all library configurations. See Cluster-scoped libraries.
Library source |
Instructions |
---|---|
Workspace |
Select a workspace file or upload a Whl, zipped wheelhouse, JAR, ZIP, tar, or requirements.txt file. See Install libraries from workspace files |
Volumes |
Select a Whl, JAR, or requirements.txt file from a volume. See Install libraries from a volume. |
File Path/GCS |
Select the library type and provide the full URI to the library object (for example: |
PyPI |
Enter a PyPI package name. See PyPI package. |
Maven |
Specify a Maven coordinate. See Maven or Spark package. |
CRAN |
Enter the name of a package. See CRAN package. |
DBFS (Not recommended) |
Load a JAR or Whl file to the DBFS root. This is not recommended, as files stored in DBFS can be modified by any workspace user. |
When you install a library on a cluster, a notebook already attached to that cluster will not immediately see the new library. You must first detach and then reattach the notebook to the cluster.
Note
A library that has taken more than 2 hours to install will be marked as failed.
Install a library with an init script
If your library requires custom configuration, you may not be able to install it using the workspace or cluster library interface. Instead, you can install the library using an init script.
Here is an example of an init script that uses pip to install Python libraries on a Databricks Runtime cluster at cluster initialization.
#!/bin/bash
/databricks/python/bin/pip install astropy
Uninstall a library from a cluster
Note
When you uninstall a library from a cluster, the library is removed only when you restart the cluster. Until you restart the cluster, the status of the uninstalled library appears as Uninstall pending restart.
To uninstall a library you can use the cluster UI:
Click Compute in the sidebar.
Click a cluster name.
Click the Libraries tab.
Select the checkbox next to the cluster you want to uninstall the library from, click Uninstall, then Confirm. The Status changes to Uninstall pending restart.
Click Restart and Confirm to uninstall the library. The library is removed from the cluster’s Libraries tab.
View the libraries installed on a cluster
Click Compute in the sidebar.
Click the cluster name.
Click the Libraries tab. For each library, the tab displays the name and version, type, install status, and, if uploaded, the source file.
Update a cluster-installed library
To update a cluster-installed library, uninstall the old version of the library and install a new version.
Note
Requirements.txt files do not require uninstalling and restarting. If you have modified the contents of a requirements.txt file, you can simply reinstall it to update the contents of the installed file.