Manage Python dependencies for Delta Live Tables pipelines

Delta Live Tables supports external dependencies in your pipelines. Databricks recommends using one of two patterns to install Python packages:

  1. Use the %pip install command to install packages for all source files in a pipeline.

  2. Import modules or libraries from source code stored in workspace files. See Import Python modules from Git folders or workspace files.

Delta Live Tables also supports using global and cluster-scoped init scripts. However, these external dependencies, particularly init scripts, increase the risk of issues with runtime upgrades. To mitigate these risks, minimize using init scripts in your pipelines. If your processing requires init scripts, automate testing of your pipeline to detect problems early. If you use init scripts, Databricks recommends increasing your testing frequency.

Important

Because JVM libraries are not supported in Delta Live Tables pipelines, do not use an init script to install JVM libraries. However, You can install other library types, such as Python libraries, with an init script.

Python libraries

To specify external Python libraries, use the %pip install magic command. When an update starts, Delta Live Tables runs all cells containing a %pip install command before running any table definitions. Every Python notebook included in the pipeline shares a library environment and has access to all installed libraries.

Important

  • %pip install commands must be in a separate cell at the top of your Delta Live Tables pipeline notebook. Do not include any other code in cells containing %pip install commands.

  • Because every notebook in a pipeline shares a library environment, you cannot define different library versions in a single pipeline. If your processing requires different library versions, you must define them in different pipelines.

The following example installs the numpy library and makes it globally available to any Python notebook in the pipeline:

%pip install simplejson

To install a Python wheel package, add the Python wheel path to the %pip install command. Installed Python wheel packages are available to all tables in the pipeline. The following example installs a Python wheel file named dltfns-1.0-py3-none-any.whl from the DBFS directory /dbfs/dlt/:

%pip install /dbfs/dlt/dltfns-1.0-py3-none-any.whl

See Install a Python wheel package with %pip.

Can I use Scala or Java libraries in a Delta Live Tables pipeline?

No, Delta Live Tables supports only SQL and Python. You cannot use JVM libraries in a pipeline. Installing JVM libraries will cause unpredictable behavior, and may break with future Delta Live Tables releases. If your pipeline uses an init script, you must also ensure that JVM libraries are not installed by the script.