Run and debug notebook cells with Databricks Connect using the Databricks extension for Visual Studio Code

You can run and debug notebooks, one cell at a time or all cells at once, and see their results in the Visual Studio Code UI using the Databricks extension for Visual Studio Code Databricks Connect integration. All code runs locally, while all code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller. All code is debugged locally, while all Spark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

Note

This feature works with Databricks Runtime 13.3 and above.

To enable the Databricks Connect integration for notebooks in the Databricks extension for Visual Studio Code, you must install Databricks Connect in the Databricks extension for Visual Studio Code. See Debug code using Databricks Connect for the Databricks extension for Visual Studio Code.

Run Python notebook cells

For notebooks with filenames that have a .py extension, when you open the notebook in the Visual Studio Code IDE, each cell displays Run Cell, Run Above, and Debug Cell buttons. As you run a cell, its results are shown in a separate tab in the IDE. As you debug, the cell being debugged displays Continue, Stop, and Step Over buttons. As you debug a cell, you can use Visual Studio Code debugging features such as watching variables’ states and viewing the call stack and debug console.

For notebooks with filenames that have a .ipynb extension, when you open the notebook in the Visual Studio Code IDE, the notebook and its cells contain additional features. See Running cells and Work with code cells in the Notebook Editor.

For more information about notebook formats for filenames with the .py and .ipynb extensions, see Export and import Databricks notebooks.

Run Python Jupyter noteboook cells

To run or debug a Python Jupyter notebook (.ipynb):

  1. In your project, open the Python Jupyter notebook that you want to run or debug. Make sure the Python file is in Jupyter notebook format and has the extension .ipynb.

    Tip

    You can create a new Python Jupyter notebook by running the >Create: New Jupyter Notebook command from within the Command Palette.

  2. Click Run All Cells to run all cells without debugging, Execute Cell to run an individual corresponding cell without debugging, or Run by Line to run an individual cell line-by-line with limited debugging, with variable values displayed in the Jupyter panel (View > Open View > Jupyter).

    For full debugging within an individual cell, set breakpoints, and then click Debug Cell in the menu next to the cell’s Run button.

    After you click any of these options, you might be prompted to install missing Python Jupyter notebook package dependencies. Click to install.

    For more information, see Jupyter Notebooks in VS Code.

Notebook globals

The following notebook globals are also enabled:

  • spark, representing an instance of databricks.connect.DatabricksSession, is preconfigured to instantiate DatabricksSession by getting Databricks authentication credentials from the extension. If DatabricksSession is already instantiated in a notebook cell’s code, this DatabricksSession settings are used instead. See Code examples for Databricks Connect for Python.

  • udf, preconfigured as an alias for pyspark.sql.functions.udf, which is an alias for Python UDFs. See pyspark.sql.functions.udf.

  • sql, preconfigured as an alias for spark.sql. spark, as described earlier, represents a preconfigured instance of databricks.connect.DatabricksSession. See Spark SQL.

  • dbutils, preconfigured as an instance of Databricks Utilities, which is imported from databricks-sdk and is instantiated by getting Databricks authentication credentials from the extension. See Use Databricks Utilities.

    Note

    Only a subset of Databricks Utilities is supported for notebooks with Databricks Connect.

    To enable dbutils.widgets, you must first install the Databricks SDK for Python by running the following command in your local development machine’s terminal:

    pip install 'databricks-sdk[notebook]'
    
  • display, preconfigured as an alias for the Jupyter builtin IPython.display.display. See IPython.display.display.

  • displayHTML, preconfigured as an alias for dbruntime.display.displayHTML, which is an alias for display.HTML from ipython. See IPython.display.html.

Notebook magics

The following notebook magics are also enabled:

  • %fs, which is the same as making dbutils.fs calls. See Mix languages.

  • %sh, which runs a command by using the cell magic %%script on the local machine. This does not run the command in the remote Databricks workspace. See Mix languages.

  • %md and %md-sandbox, which runs the cell magic %%markdown. See Mix languages.

  • %sql, which runs spark.sql. See Mix languages.

  • %pip, which runs pip install on the local machine. This does not run pip install in the remote Databricks workspace. See Manage libraries with %pip commands.

  • %run, which runs another notebook. See Orchestrate notebooks and modularize code in notebooks.

    Note

    To enable %run, you must first install the nbformat library by running the following command in your local development machine’s terminal:

    pip install nbformat
    

Additional features that are enabled include:

  • Spark DataFrames are converted to pandas DataFrames, which are displayed in Jupyter table format.

Limitations

Limitations of running cells in notebooks in Visual Studio Code include:

  • The notebooks magics %r and %scala are not supported and display an error if called. See Mix languages.

  • The notebook magic %sql does not support some DML commands, such as Show Tables.