Troubleshooting Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.0 and above.

This article provides troubleshooting information for Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. See What is Databricks Connect?. For the Scala version of this article, see Troubleshooting Databricks Connect for Scala.

Error: StatusCode.UNAVAILABLE, StatusCode.UNKNOWN, DNS resolution failed, or Received http2 header with status 500

Issue: When you try to run code with Databricks Connect, you get an error messages that contains strings such as StatusCode.UNAVAILABLE, StatusCode.UNKNOWN, DNS resolution failed, or Received http2 header with status: 500.

Possible cause: Databricks Connect cannot reach your cluster.

Recommended solutions:

  • Check to make sure that your workspace instance name is correct. If you use environment variables, check to make sure the related environment variable is available and correct on your local development machine.

  • Check to make sure that your cluster ID is correct. If you use environment variables, check to make sure the related environment variable is available and correct on your local development machine.

  • Check to make sure that your cluster has the correct custom cluster version that is compatible with Databricks Connect.

Python version mismatch

Check the Python version you are using locally has at least the same minor release as the version on the cluster (for example, 3.10.11 versus 3.10.10 is OK, 3.10 versus 3.9 is not).

If you have multiple Python versions installed locally, ensure that Databricks Connect is using the right one by setting the PYSPARK_PYTHON environment variable (for example, PYSPARK_PYTHON=python3).

Conflicting PySpark installations

The databricks-connect package conflicts with PySpark. Having both installed will cause errors when initializing the Spark context in Python. This can manifest in several ways, including “stream corrupted” or “class not found” errors. If you have PySpark installed in your Python environment, ensure it is uninstalled before installing databricks-connect. After uninstalling PySpark, make sure to fully re-install the Databricks Connect package:

pip3 uninstall pyspark
pip3 uninstall databricks-connect
pip3 install --upgrade "databricks-connect==14.0.*"  # or X.Y.* to match your specific cluster version.

Conflicting or Missing PATH entry for binaries

It is possible your PATH is configured so that commands like spark-shell will be running some other previously installed binary instead of the one provided with Databricks Connect. You should make sure either the Databricks Connect binaries take precedence, or remove the previously installed ones.

If you can’t run commands like spark-shell, it is also possible your PATH was not automatically set up by pip3 install and you’ll need to add the installation bin dir to your PATH manually. It’s possible to use Databricks Connect with IDEs even if this isn’t set up.

The filename, directory name, or volume label syntax is incorrect on Windows

If you are using Databricks Connect on Windows and see:

The filename, directory name, or volume label syntax is incorrect.

Databricks Connect was installed into a directory with a space in your path. You can work around this by either installing into a directory path without spaces, or configuring your path using the short name form.