Databricks Connect for Python support in Databricks notebooks

Note

This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.

Databricks Connect allows you to connect to Databricks compute from a local development environment. You can then develop, debug, and test your code directly from your IDE before executing it as part of a Databricks notebook or job. See What is Databricks Connect?.

For information about using Databricks Connect with Jupyter Notebook, see Use classic Jupyter Notebook with Databricks Connect for Python.

Limitations

To make the transition from local development to deployment to Databricks seamless, all of the Databricks Connect APIs are available in Databricks notebooks. This allows you to run your code in a Databricks notebook without any changes to your code. However, there are some differences between using Databricks Connect for Python in a local development environment and in Databricks notebooks and jobs:

  • When developing locally within an IDE, spark = DatabricksSession.builder.getOrCreate() gets an existing Spark session for the provided configuration if it exists, or creates a new session if it does not exist. Connection parameters such as host, token, and cluster_id are populated either from the source code, environment variables, or the .databrickscfg configuration profiles file.

  • When developing within Databricks notebooks, spark = DatabricksSession.builder.getOrCreate() returns the default Spark session (also accessible through the spark variable) when used without any additional configuration. A new session is created if additional connection parameters are set, for example, by using DatabricksSession.builder.clusterId(...).getOrCreate() or DatabricksSession.builder.serverless().getOrCreate().