Use PyCharm with Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.0 and above.

This article covers how to use Databricks Connect for Python with PyCharm. Databricks Connect enables you to connect popular IDEs, notebook servers, and other custom applications to Databricks clusters. See What is Databricks Connect?.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

IntelliJ IDEA Ultimate provides plugin support for PyCharm with Python also. For details, see Python plug-in for IntelliJ IDEA Ultimate.

To use Databricks Connect with PyCharm and Python, follow these instructions for venv or Poetry. This article was tested with PyCharm Community Edition 2023.3.5. If you use a different version or edition of PyCharm, the following instructions might vary.

Use PyCharm with venv and Databricks Connect for Python

  1. Start PyCharm.

  2. Create a project: click File > New Project.

  3. In the New Project dialog, click Pure Python.

  4. For Location, click the folder icon, and then select the path to the existing venv virtual environment that you created in Install Databricks Connect for Python.

  5. For Interpreter type, click Custom environment.

  6. For Environment, select Select existing.

  7. For Type, select Python.

  8. For Path, use the folder icon or drop-down list to select the path to the Python interpreter in the existing venv virtual environment.

    Tip

    The Python interpreter for a venv virtual environment is typically installed in </path-to-venv>/bin. For more information, see venv.

  9. Click OK.

  10. Click Create.

  11. Add to the project a Python code (.py) file that contains either the example code or your own code. If you use your own code, at minimum you must initialize DatabricksSession as shown in the example code.

  12. With the Python code file open, set any breakpoints where you want your code to pause while running.

  13. To run the code, click Run > Run. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.

  14. To debug the code, click Run > Debug. All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

  15. Follow the on-screen instructions to start running or debugging the code.

For more specific run and debug instructions, see Run without any previous configuring and Debug.

Use PyCharm with Poetry and Databricks Connect for Python

  1. Start PyCharm.

  2. Create a project: click File > New Project.

    1. In the New Project dialog, click Pure Python.

  3. For Location, click the folder icon, and then select the path to the existing Poetry virtual environment that you created in Install Databricks Connect for Python.

  4. For Interpreter type, click Custom environment.

  5. For Environment, select Select existing.

  6. For Type, select Python.

  7. For Path, use the folder icon or drop-down list to select the path to the Python interpreter in the existing Poetry virtual environment.

    Tip

    Be sure to select the path to the Python interpreter. Do not select the path to the Poetry executable.

    For information about where the system version of the Python interpreter is installed, see How to Add Python to PATH.

  8. Click OK.

  9. Click Create.

  10. Add to the project a Python code (.py) file that contains either the example code or your own code. If you use your own code, at minimum you must initialize DatabricksSession as shown in the example code.

  11. With the Python code file open, set any breakpoints where you want your code to pause while running.

  12. To run the code, click Run > Run. All Python code runs locally, while all PySpark code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller.

  13. To debug the code, click Run > Debug. All Python code is debugged locally, while all PySpark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

  14. Follow the on-screen instructions to start running or debugging the code.

For more specific run and debug instructions, see Run without any previous configuring and Debug.