Databricks Connect enables you to connect popular IDEs such as PyCharm, notebook servers, and other custom applications to Databricks clusters.
This article covers Databricks Connect for Databricks Runtime 13.0 and above.
For information beyond this tutorial about Databricks Connect for Databricks Runtime 13.0 and above, see the Databricks Connect reference.
For information about Databricks Connect for prior Databricks Runtime versions, see Databricks Connect for Databricks Runtime 12.2 LTS and below.
You have PyCharm installed.
You have a Databricks cluster in the workspace. The cluster has Databricks Runtime 13.0 or higher installed. The cluster also has a cluster access mode of assigned or shared. See Access modes.
You have Python 3 installed on your development machine, and the minor version of your client Python installation is the same as the minor Python version of your Databricks cluster. The following table shows the Python version installed with each Databricks Runtime.
Databricks Runtime version
13.0 ML - 14.0 ML, 13.0 - 14.0
To complete this tutorial, follow these steps:
This tutorial uses Databricks personal access token authentication and a Databricks configuration profile for authenticating with your Databricks workspace. If you already have a Databricks personal access token and a matching Databricks configuration profile, skip ahead to Step 3.
To create a personal access token:
In your Databricks workspace, click your Databricks username in the top bar, and then select User Settings from the drop down.
Next to Access tokens, click Manage.
Click Generate new token.
(Optional) Enter a comment that helps you to identify this token in the future, and change the token’s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the Lifetime (days) box empty (blank).
Copy the displayed token to a secure location, and then click Done.
Be sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the X next to the token on the Access tokens page.
If you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following:
Create a Databricks authentication configuration profile to store necessary information about your personal access token on your local machine. Databricks developer tools and SDKs can use this configuration profile to quickly authenticate with your Databricks workspace.
To create a profile:
Create a file named
.databrickscfgin the root of your user’s home directory on your machine, if this file does not already exist. For Linux and macOS, the path is
~/.databrickscfg. For Windows, the path is
Use a text editor to add the following configuration profile to this file and then save the file:
[<some-unique-profile-name>] host = <my-workspace-url> token = <my-personal-access-token-value> cluster_id = <my-cluster-id>
Replace the following placeholders:
<some-unique-profile-name>with some unique name for this profile. This name must be unique within the
<my-workspace-url>with your Databricks workspace URL, starting with
https://. See Workspace instance names, URLs, and IDs.
<my-personal-access-token-value>with your Databricks personal access token value. See Databricks personal access token authentication.
<my-cluster-id>with the ID of your Databricks cluster. See Cluster URL and ID.
[DEFAULT] host = https://my-workspace-url.com token = dapi... cluster_id = abc123...
The preceding fields
tokenare for Databricks personal access token authentication, which is the most common type of Databricks authentication. Some Databricks developer tools and SDKs also use the
cluster_idfield in some scenarios. For other supported Databricks authentication types and scenarios, see your tool’s or SDK’s documentation or Databricks client unified authentication.
Click File > New Project.
For Location, click the folder icon, and complete the on-screen directions to specify the path to your new Python project.
Expand Python interpreter: New environment.
Click the New environment using option.
In the drop-down list, select Virtualenv.
Leave Location with the suggested path to the
For Base interpreter, use the drop-down list or click the ellipses to specify the path to the Python interpreter from the preceding requirements.
On PyCharm’s main menu, click View > Tool Windows > Python Packages.
In the search box, enter
In the PyPI repository list, click databricks-connect.
In the result pane’s latest drop-down list, select the version that matches your cluster’s Databricks Runtime version. For example, if your cluster has Databricks Runtime 13.2 installed, select 13.2.0.
After the package installs, you can close the Python Packages window.
In the Project tool window, right-click the project’s root folder, and click New > Python File.
main.pyand click Python file.
Enter the following code into the file and then save the file:
from databricks.connect import DatabricksSession spark = DatabricksSession.builder.getOrCreate() df = spark.read.table("samples.nyctaxi.trips") df.show(5)
Start the target cluster in your remote Databricks workspace.
After the cluster has started, on the main menu, click Run > Run. If prompted, select main > Run.
In the Run tool window (View > Tool Windows > Run), in the Run tab’s main pane, the first 5 rows of the
With the cluster still running, in the preceding code, click the gutter next to
df.show(5)to set a breakpoint.
On the main menu, click Run > Debug. If prompted, select main > Debug.
In the Debug tool window (View > Tool Windows > Debug), in the Debugger tab’s Variables pane, expand the df and spark variable nodes to browse information about the code’s
In the Debug tool window’s sidebar, click the green arrow (Resume Program) icon.
In the Debugger tab’s Console pane, the first 5 rows of the
To learn more about Databricks Connect and experiment with a more complex code example, see the Databricks Connect reference. This reference article includes guidance for the following topics:
Supported Databricks authentication types in addition to Databricks personal access token authentication.
How to use SparkShell, and use IDEs in addition to PyCharm such as JupyterLab, classic Jupyter Notebook, Visual Studio Code, and Eclipse with PyDev.
Migrate from Databricks Connect for Databricks Runtime 12.2 LTS and below to Databricks Connect for Databricks Runtime 13.0 and above.
How to use Databricks Connect to access Databricks Utilities.
Lists the limitations of Databricks Connect.