Databricks Connect for Python support in Databricks notebooks
Note
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
Databricks Connect allows you to connect to Databricks compute from a local development environment. You can then develop, debug, and test your code directly from your IDE before executing it as part of a Databricks notebook or job. See What is Databricks Connect?.
For information about using Databricks Connect with Jupyter Notebook, see Use classic Jupyter Notebook with Databricks Connect for Python.
Limitations
To make the transition from local development to deployment to Databricks seamless, all of the Databricks Connect APIs are available in Databricks notebooks. This allows you to run your code in a Databricks notebook without any changes to your code. However, there are some differences between using Databricks Connect for Python in a local development environment and in Databricks notebooks and jobs:
When developing locally within an IDE,
spark = DatabricksSession.builder.getOrCreate()
gets an existing Spark session for the provided configuration if it exists, or creates a new session if it does not exist. Connection parameters such ashost
,token
, andcluster_id
are populated either from the source code, environment variables, or the.databrickscfg
configuration profiles file.When developing within Databricks notebooks,
spark = DatabricksSession.builder.getOrCreate()
returns the default Spark session (also accessible through thespark
variable) when used without any additional configuration. A new session is created if additional connection parameters are set, for example, by usingDatabricksSession.builder.clusterId(...).getOrCreate()
orDatabricksSession.builder.serverless().getOrCreate()
.