Install notebook dependencies
Preview
This feature is in Public Preview.
You can install Python dependencies for serverless notebooks using the Environment side panel. This panel provides a single place to edit, view, and export a notebook’s library requirements. These dependencies can be added using a base environment or individually.
For non-notebook tasks, see Configure environments and dependencies for non-notebook tasks.
Important
Do not install PySpark or any library that installs PySpark as a dependency on your serverless notebooks. Doing so will stop your session and result in an error. If this occurs, reset your environment.
Configure a base environment
A base environment is a YAML file stored as a workspace file or on a Unity Catalog volume that specifies additional environment dependencies. Base environments can be shared among notebooks. To configure a base environment:
Create a YAML file that defines settings for a Python virtual environment. The following example YAML, which is based on the MLflow projects environment specification, defines a base environment with a few library dependencies:
client: "1" dependencies: - --index-url https://pypi.org/simple - -r "/Workspace/Shared/requirements.txt" - cowsay==6.1 - "/Workspace/Shared/Path/To/simplejson-3.19.3-py3-none-any.whl" - git+https://github.com/databricks/databricks-cli
Upload the YAML file as a workspace file or to a Unity Catalog volume. See Import a file or Upload files to a Unity Catalog volume.
To the right of the notebook, click the button to expand the Environment panel. This button only appears when a notebook is connected to serverless compute.
In the Base Environment field, enter the path of the uploaded YAML file or navigate to it and select it.
Click Apply. This installs the dependencies in the notebook virtual environment and restarts the Python process.
Users can override the dependencies specified in the base environment by installing dependencies individually.
Configure the notebook environment
You can also install dependencies on a notebook connected to serverless compute using the Dependencies tab of the Environment panel:
To the right of the notebook, click on the button to expand the Environment panel. This button only appears when a notebook is connected to serverless compute.
Select the client image from the Client version drop-down. See Serverless client images. Databricks recommends picking the latest version to get the most up-to-date notebook features.
In the Dependencies section, click Add Dependency and enter the path of the library dependency in the field. You can specify a dependency in any format that is valid in a requirements.txt file.
Click Apply. This installs the dependencies in the notebook virtual environment and restarts the Python process.
Note
A job using serverless compute will install the environment specification of the notebook before executing the notebook code. This means that there is no need to add dependencies when scheduling notebooks as jobs. See Configure environments and dependencies.
View installed dependencies and pip logs
To view installed dependencies, click Installed in the Environments side panel for a notebook. Pip installation logs for the notebook environment are also available by clicking Pip logs at the bottom of the panel.
Reset the environment
If your notebook is connected to serverless compute, Databricks automatically caches the content of the notebook’s virtual environment. This means you generally do not need to reinstall the Python dependencies specified in the Environment panel when you open an existing notebook, even if it has been disconnected due to inactivity.
Python virtual environment caching also applies to jobs. This means that subsequent runs of jobs are faster as required dependencies are already available.
Note
If you change the implementation of a custom Python package used in a job on serverless, you must also update its version number so that jobs can pick up the latest implementation.
To clear the environment cache and perform a fresh install of the dependencies specified in the Environment panel of a notebook attached to serverless compute, click the arrow next to Apply and then click Reset environment.
Note
Reset the virtual environment if you install packages that break or change the core notebook or Apache Spark environment. Detaching the notebook from serverless compute and reattaching it does not necessarily clear the entire environment cache.
Configure environments and dependencies for non-notebook tasks
For other supported task types, such as Python script, Python wheel, or dbt tasks, a default environment includes installed Python libraries. To see the list of installed libraries, see the Installed Python libraries section of the client version you are using. See Serverless client images. If a task requires a Python library that is not installed, you can install the library from workspace files, Unity Catalog volumes, or public package repositories. To add a library when you create or edit a task:
In the Environment and Libraries dropdown menu, click next to the Default environment or click + Add new environment.
Select the client image from the Client version drop-down. See Serverless client images. Databricks recommends picking the latest version to get the most up-to-date features.
In the Configure environment dialog, click + Add library.
Select the type of dependency from the dropdown menu under Libraries.
In the File Path text box, enter the path to the library.
For a Python Wheel in a workspace file, the path should be absolute and start with
/Workspace/
.For a Python Wheel in a Unity Catalog volume, the path should be
/Volumes/<catalog>/<schema>/<volume>/<path>.whl
.For a
requirements.txt
file, select PyPi and enter-r /path/to/requirements.txt
.
Click Confirm or + Add library to add another library.
If you’re adding a task, click Create task. If you’re editing a task, click Save task.