The CLI is unavailable on Databricks on Google Cloud as of this release.
Notebook-scoped libraries let you create, modify, save, reuse, and share custom Python environments that are specific to a notebook. When you install a notebook-scoped library, only the current notebook and any jobs associated with that notebook have access to that library. Other notebooks attached to the same cluster are not affected.
Notebook-scoped libraries do not persist across sessions. You must reinstall notebook-scoped libraries at the beginning of each session, or whenever the notebook is detached from a cluster.
There are two methods for installing notebook-scoped libraries:
%pipmagic command in a notebook. Databricks recommends using this approach for new workloads. This article describes how to use these magic commands.
On Databricks Runtime 10.5 and below, you can use the Databricks library utility. The library utility is supported only on Databricks Runtime, not Databricks Runtime ML. See Library utility (dbutils.library).
On a High Concurrency cluster running Databricks Runtime 7.3 LTS ML or Databricks Runtime 7.4 ML, notebook-scoped libraries are not compatible with table access control. An alternative is to use Library utility (dbutils.library) on a Databricks Runtime cluster, or to upgrade your cluster to Databricks Runtime 7.5 ML.
Using notebook-scoped libraries might result in more traffic to the driver node as it works to keep the environment consistent across executor nodes.
When you use a cluster with 100 or more nodes, the minimum requirement for the driver node is instance type
For larger clusters, use a larger driver node.
You should place all
%pipcommands at the beginning of the notebook. The notebook state is reset after any
%pipcommand that modifies the environment. If you create Python methods or variables in a notebook, and then use
%pipcommands in a later cell, the methods or variables are lost.
Upgrading, modifying, or uninstalling core Python packages (such as IPython) with
%pipmay cause some features to stop working as expected. For example, IPython 7.21 and above are incompatible with Databricks Runtime 8.1 and below. If you experience such problems, reset the environment by detaching and re-attaching the notebook or by restarting the cluster.
%pip command is equivalent to the pip command and supports the same API. The following sections show examples of how you can use
%pip commands to manage your environment. For more information on installing Python packages with
pip, see the pip install documentation and related pages.
In this section:
%pip install matplotlib
%pip install /path/to/my_package.whl
You cannot uninstall a library that is included in Databricks Runtime or a library that has been installed as a cluster library. If you have installed a different library version than the one included in Databricks Runtime or the one installed on the cluster, you can use
%pip uninstall to revert the library to the default version in Databricks Runtime or the version installed on the cluster, but you cannot use a
%pip command to uninstall the version of a library included in Databricks Runtime or installed on the cluster.
%pip uninstall -y matplotlib
-y option is required.
%pip install git+https://github.com/databricks/databricks-cli
You can add parameters to the URL to specify things like the version or git subdirectory. See the VCS support for more information and for examples using other version control systems.
Pip supports installing packages from private sources with basic authentication, including private version control systems and private package repositories, such as Nexus and Artifactory. Secret management is available via the Databricks Secrets API, which allows you to store authentication tokens and passwords. Use the DBUtils API to access secrets from your notebook. Note that you can use
$variables in magic commands.
To install a package from a private repository, specify the repository URL with the
--index-url option to
%pip install or add it to the
pip config file at
token = dbutils.secrets.get(scope="scope", key="key")
%pip install --index-url https://<user>:$token@<your-package-repository>.com/<path/to/repo> <package>==<version> --extra-index-url https://pypi.org/simple/
Similarly, you can use secret management with magic commands to install private packages from version control systems.
token = dbutils.secrets.get(scope="scope", key="key")
%pip install git+https://<user>:$token@<gitprovider>.com/<path/to/repo>
You can use
%pip to install a private package that has been saved on DBFS.
When you upload a file to DBFS, it automatically renames the file, replacing spaces, periods, and hyphens with underscores. For wheel files,
pip requires that the name of the file use periods in the version (for example, 0.1.0) and hyphens instead of spaces or underscores, so these filenames are not changed.
%pip install /dbfs/mypackage-0.0.1-py3-none-any.whl
%pip freeze > /dbfs/requirements.txt
Any subdirectories in the file path must already exist. If you run
%pip freeze > /dbfs/<new-directory>/requirements.txt, the command fails if the directory
/dbfs/<new-directory> does not already exist.
%conda commands have been deprecated, and will no longer be supported after Databricks Runtime ML 8.4. Databricks recommends using
%pip for managing notebook-scoped libraries. If you require Python libraries that can only be installed using conda, you can use conda-based docker containers to pre-install the libraries you need.
Anaconda Inc. updated their terms of service for anaconda.org channels in September 2020. Based on the new terms of service you may require a commercial license if you rely on Anaconda’s packaging and distribution. See Anaconda Commercial Edition FAQ for more information. Your use of any Anaconda channels is governed by their terms of service.
As a result of this change, Databricks has removed the default channel configuration for the Conda package manager. This is a breaking change.
To install or update packages using the
%conda command, you must specify a channel using
-c. You must also update all usage of
%conda install and
%sh conda install to specify a channel using
-c. If you do not specify a channel, conda commands will fail with
%conda command is equivalent to the conda command and supports the same API with some restrictions noted below. The following sections contain examples of how to use
%conda commands to manage your environment. For more information on installing Python packages with
conda, see the conda install documentation.
%conda magic commands are not available on Databricks Runtime. They are only available on Databricks Runtime ML up to Databricks Runtime ML 8.4, and on Databricks Runtime for Genomics. Databricks recommends using
pip to install libraries. For more information, see Understanding conda and pip.
If you must use both
%conda commands in a notebook, see Interactions between pip and conda commands.
conda commands are not supported when used with
In this section:
%conda install matplotlib -c conda-forge
%conda uninstall matplotlib
To show the Python environment associated with a notebook, use
To avoid conflicts, follow these guidelines when using
conda to install Python packages and libraries.
Libraries installed using the API or using the cluster UI are installed using
pip. If any libraries have been installed from the API or the cluster UI, you should use only
%pipcommands when installing notebook-scoped libraries.
If you use notebook-scoped libraries on a cluster, init scripts run on that cluster can use either
pipcommands to install libraries. However, if the init script includes
pipcommands, use only
%pipcommands in notebooks (not
It’s best to use either
pipcommands exclusively or
condacommands exclusively. If you must install some packages using
condaand some using
pip, run the
condacommands first, and then run the
pipcommands. For more information, see Using Pip in a Conda Environment.
Libraries installed from the cluster UI or API are available to all notebooks on the cluster. These libraries are installed using
pip; therefore, if libraries are installed using the cluster UI, use only
%pip commands in notebooks.
Libraries installed using an init script are available to all notebooks on the cluster.
If you use notebook-scoped libraries on a cluster running Databricks Runtime ML or Databricks Runtime for Genomics, init scripts run on the cluster can use either
pip commands to install libraries. However, if the init script includes
pip commands, then use only
%pip commands in notebooks.
For example, this notebook code snippet generates a script that installs fast.ai packages on all the cluster nodes.
dbutils.fs.put("dbfs:/home/myScripts/fast.ai", "conda install -c pytorch -c fastai fastai -y", True)
! execute a shell command in a notebook; the former is a Databricks auxiliary magic command while the latter is a feature of IPython.
pip is a shorthand for
%pip when automagic is enabled, which is the default in Databricks Python notebooks.
On Databricks Runtime 11.0 and above,
%sh pip, and
!pip all install a library as a notebook-scoped Python library. On Databricks Runtime 10.4 LTS and below, Databricks recommends using only
pip to install notebook-scoped libraries. The behavior of
%sh pip and
!pip is not consistent in Databricks Runtime 10.4 LTS and below.
When you use
%conda env updateto update a notebook environment, the installation order of packages is not guaranteed. This can cause problems for the
horovodpackage, which requires that
torchbe installed before
horovodin order to use
horovod.torchrespectively. If this happens, uninstall the
horovodpackage and reinstall it after ensuring that the dependencies are installed.