Libraries
Note
The CLI is unavailable on Databricks on Google Cloud as of this release.
To make third-party or custom code available to notebooks and jobs running on your clusters, you can install a library. Libraries can be written in Python, Java, Scala, and R. You can upload Java, Scala, and Python libraries and point to external packages in PyPI, Maven, and CRAN repositories.
This article focuses on performing library tasks in the workspace UI. You can also manage libraries using the Libraries CLI or the Libraries API 2.0.
Tip
Databricks includes many common libraries in Databricks Runtime. To see which libraries are included in Databricks Runtime, look at the System Environment subsection of the Databricks Runtime release notes for your Databricks Runtime version.
Important
Databricks does not invoke Python atexit
functions when your notebook or job completes processing. If you use a Python library that registers atexit
handlers, you must ensure your code calls required functions before exiting.
Installing Python eggs is deprecated and will be removed in a future Databricks Runtime release. Use Python wheels or install packages from PyPI instead.
You can install libraries in three modes: workspace, cluster-installed, and notebook-scoped.
Workspace libraries serve as a local repository from which you create cluster-installed libraries. A workspace library might be custom code created by your organization, or might be a particular version of an open-source library that your organization has standardized on.
Cluster libraries can be used by all notebooks running on a cluster. You can install a cluster library directly from a public repository such as PyPI or Maven, or create one from a previously installed workspace library.
Notebook-scoped libraries, available for Python and R, allow you to install libraries and create an environment scoped to a notebook session. These libraries do not affect other notebooks running on the same cluster. Notebook-scoped libraries do not persist and must be re-installed for each session. Use notebook-scoped libraries when you need a custom environment for a specific notebook.
This section covers:
Python environment management
The following table provides an overview of options you can use to install Python libraries in Databricks.
Note
Notebook-scoped libraries using the %pip magic command are enabled by default in all supported Databricks Runtime and Databricks Runtime ML versions. See Requirements for details.
Notebook-scoped libraries with the library utility are deprecated.
Python package source |
Notebook-scoped libraries with the library utility (deprecated) |
Job libraries with Jobs API |
||
---|---|---|---|---|
PyPI |
Use |
Use |
Select PyPI as the source. |
Add a new |
Private PyPI mirror, such as Nexus or Artifactory |
Use |
Use |
Not supported. |
Not supported. |
VCS, such as GitHub, with raw source |
Use |
Not supported. |
Select PyPI as the source and specify the repository URL as the package name. |
Add a new |
Private VCS with raw source |
Use |
Not supported. |
Not supported. |
Not supported. |
DBFS |
Use |
Use |
Select DBFS/GCS as the source. |
Add a new |
GCS |
Use |
Use |
Select DBFS/GCS as the source. |
Add a new |