Use a private artifact in a bundle
Files and artifacts stored in third party tools such as JFrog Artifactory or in private repositories may need to be part of your Databricks Asset Bundles. This article describes how to handle these files. For information about Databricks Asset Bundles, see What are Databricks Asset Bundles?.
For an example bundle that uses a private wheel, see the bundle-examples GitHub repository.
Tip
If you are using notebooks, you can install Python wheels from a private repository in a notebook, then add a notebook_task
to the job in your bundle. See Notebook-scoped Python libraries.
Download the artifact locally
To manage a private artifact using Databricks Asset Bundles, you first need to download it locally. Then you can reference it in your bundle and deploy it to the workspace as part of the bundle, or you can upload it to Unity Catalog and reference it in your bundle.
For example, the following command downloads a Python wheel file to the dist
directory:
pip download -d dist my-wheel==1.0
You could also download a private PyPI package, then copy it to the dist
directory.
export PYPI_TOKEN=<YOUR TOKEN>
pip download -d dist my-package==1.0.0 --index-url https://$PYPI_TOKEN@<package-index-url> --no-deps
(Optional) Upload the artifact to Unity Catalog
Once you have downloaded the artifact, you can optionally copy the downloaded artifact to your Unity Catalog volume using the Databricks CLI, so that it can be referenced from your bundle instead of uploaded to your workspace when the bundle is deployed. The following example copies a wheel to a Unity Catalog volume:
databricks fs cp my-wheel-1.0-*.whl dbfs:/Volumes/myorg_test/myorg_volumes/packages
Tip
Databricks Asset Bundles will automatically upload all artifacts referenced in the bundle to Unity Catalog if you set artifact_path
in your bundle configuration to a Unity Catalog volumes path.
Reference the artifact
To include the artifact in your bundle, reference it in your configuration.
The following example bundle references a wheel file in the dist
directory in a job. This configuration uploads the wheel to the workspace when the bundle is deployed.
resources:
jobs:
demo-job:
name: demo-job
tasks:
- task_key: python-task
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D4s_v5
num_workers: 1
spark_python_task:
python_file: ../src/main.py
libraries:
- whl: ../dist/my-wheel-1.0-*.whl
If you uploaded your artifact to a Unity Catalog volume, configure your job to reference it at that location:
resources:
jobs:
demo-job:
name: demo-job
tasks:
- task_key: python-task
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D4s_v5
num_workers: 1
spark_python_task:
python_file: ../src/main.py
libraries:
- whl: /Volumes/myorg_test/myorg_volumes/packages/my-wheel-1.0-py3-none-any.whl
For a Python wheel, it can alternatively be referenced in a python_wheel_task
for a job:
resources:
jobs:
demo-job:
name: demo-job
tasks:
- task_key: wheel_task
python_wheel_task:
package_name: my_package
entry_point: entry
job_cluster_key: Job_cluster
libraries:
- whl: ../dist/my-wheel-1.0-*.whl