Manage libraries with %conda commands (legacy)

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See Notebook-scoped Python libraries.

Important

%conda commands are deprecated, and are supported only for Databricks Runtime 7.3 LTS ML. Databricks recommends using %pip for managing notebook-scoped libraries. If you require Python libraries that can only be installed using conda, you can use conda-based docker containers to pre-install the libraries you need.

Anaconda Inc. updated their terms of service for anaconda.org channels in September 2020. Based on the new terms of service you may require a commercial license if you rely on Anaconda’s packaging and distribution. See Anaconda Commercial Edition FAQ for more information. Your use of any Anaconda channels is governed by their terms of service.

As a result of this change, Databricks has removed the default channel configuration for the Conda package manager. This is a breaking change.

To install or update packages using the %conda command, you must specify a channel using -c. You must also update all usage of %conda install and %sh conda install to specify a channel using -c. If you do not specify a channel, conda commands will fail with PackagesNotFoundError.

The %conda command is equivalent to the conda command and supports the same API with some restrictions noted below. The following sections contain examples of how to use %conda commands to manage your environment. For more information on installing Python packages with conda, see the conda install documentation.

Note that %conda magic commands are not available on Databricks Runtime. They are only available on Databricks Runtime 7.3 LTS ML. Databricks recommends using pip to install libraries. For more information, see Understanding conda and pip.

If you must use both %pip and %conda commands in a notebook, see Interactions between pip and conda commands.

Note

The following conda commands are not supported when used with %conda:

  • activate

  • create

  • init

  • run

  • env create

  • env remove

Install a library with %conda

%conda install matplotlib -c conda-forge

Uninstall a library with %conda

%conda uninstall matplotlib

Save and reuse or share an environment

When you detach a notebook from a cluster, the environment is not saved. To save an environment so you can reuse it later or share it with someone else, follow these steps.

Databricks recommends that environments be shared only between clusters running the same version of Databricks Runtime ML.

  1. Save the environment as a conda YAML specification.

    %conda env export -f /dbfs/myenv.yml
    
  2. Import the file to another notebook using conda env update.

    %conda env update -f /dbfs/myenv.yml
    

List the Python environment of a notebook

To show the Python environment associated with a notebook, use %conda list:

%conda list

Interactions between pip and conda commands

To avoid conflicts, follow these guidelines when using pip or conda to install Python packages and libraries.

  • Libraries installed using the Libraries API or using the cluster UI are installed using pip. If any libraries have been installed from the API or the cluster UI, you should use only %pip commands when installing notebook-scoped libraries.

  • If you use notebook-scoped libraries on a cluster, init scripts run on that cluster can use either conda or pip commands to install libraries. However, if the init script includes pip commands, use only %pip commands in notebooks (not %conda).

  • It’s best to use either pip commands exclusively or conda commands exclusively. If you must install some packages using conda and some using pip, run the conda commands first, and then run the pip commands. For more information, see Using Pip in a Conda Environment.

Frequently asked questions (FAQ)

How do libraries installed from the cluster UI/API interact with notebook-scoped libraries?

Libraries installed from the cluster UI or API are available to all notebooks on the cluster. These libraries are installed using pip; therefore, if libraries are installed using the cluster UI, use only %pip commands in notebooks.

How do libraries installed using an init script interact with notebook-scoped libraries?

Libraries installed using an init script are available to all notebooks on the cluster.

If you use notebook-scoped libraries on a cluster running Databricks Runtime ML, init scripts run on the cluster can use either conda or pip commands to install libraries. However, if the init script includes pip commands, then use only %pip commands in notebooks.

For example, this notebook code snippet generates a script that installs fast.ai packages on all the cluster nodes.

dbutils.fs.put("dbfs:/home/myScripts/fast.ai", "conda install -c pytorch -c fastai fastai -y", True)

Can I use %pip and %conda commands in job notebooks?

Yes.

Can I use %pip and %conda commands in R or Scala notebooks?

Yes, in a Python magic cell.

Can I use %sh pip, !pip, or pip? What is the difference?

%sh and ! execute a shell command in a notebook; the former is a Databricks auxiliary magic command while the latter is a feature of IPython. pip is a shorthand for %pip when automagic is enabled, which is the default in Databricks Python notebooks.

On Databricks Runtime 11.0 and above, %pip, %sh pip, and !pip all install a library as a notebook-scoped Python library. On Databricks Runtime 10.4 LTS and below, Databricks recommends using only %pip or pip to install notebook-scoped libraries. The behavior of %sh pip and !pip is not consistent in Databricks Runtime 10.4 LTS and below.

Can I update R packages using %conda commands?

No.

Known issues

  • When you use %conda env update to update a notebook environment, the installation order of packages is not guaranteed. This can cause problems for the horovod package, which requires that tensorflow and torch be installed before horovod in order to use horovod.tensorflow or horovod.torch respectively. If this happens, uninstall the horovod package and reinstall it after ensuring that the dependencies are installed.

  • On Databricks Runtime 10.3 and below, notebook-scoped libraries are incompatible with batch streaming jobs. Databricks recommends using cluster libraries or the IPython kernel instead.