Use notebooks with Databricks Connect

Note

This feature works with Databricks Runtime 13.3 and above.

You can run Databricks notebooks and see their results in the Visual Studio Code IDE, one cell at a time or all cells at once, by using the Databricks Connect integration in the Databricks extension for Visual Studio Code. All code runs locally, while all code involving DataFrame operations runs on the cluster in the remote Databricks workspace and run responses are sent back to the local caller. You can also debug cells. All code is debugged locally, while all Spark code continues to run on the cluster in the remote Databricks workspace. The core Spark engine code cannot be debugged directly from the client.

By default, without the Databricks Connect integration that is described in this article, notebook usage is limited:

  • You cannot run notebooks one cell at a time by using just the Databricks extension for Visual Studio Code.

  • You cannot debug cells.

  • You can run notebooks only as Databricks jobs and see only the notebooks’ run results in the Visual Studio Code IDE.

  • All notebook code runs only on the clusters that are associated with these jobs.

To enable the Databricks Connect integration for notebooks in the Databricks extension for Visual Studio Code, you must enable the Databricks Connect integration in the Databricks extension for Visual Studio Code. See Debug code by using Databricks Connect for the Databricks extension for Visual Studio Code.

After enablement, for notebooks with filenames that have a .py extension, when you open the notebook in the Visual Studio Code IDE, each cell displays Run Cell, Run Above, and Debug Cell buttons. As you run a cell, its results are shown in a separate tab in the IDE. As you debug, the cell being debugged displays Continue, Stop, and Step Over buttons. As you debug a cell, you can use Visual Studio Code debugging features such as watching variables’ states and viewing the call stack and debug console.

After enablement, for notebooks with filenames that have a .ipynb extension, when you open the notebook in the Visual Studio Code IDE, the notebook and its cells contain additional features. See Running cells and Work with code cells in the Notebook Editor.

For more information about notebook formats for filenames with the .py and .ipynb extensions, see Export and import Databricks notebooks.

The following notebook globals are also enabled:

  • spark, representing an instance of databricks.connect.DatabricksSession, is preconfigured to instantiate DatabricksSession by getting Databricks authentication credentials from the extension. If DatabricksSession is already instantiated in a notebook cell’s code, this DatabricksSession settings are used instead. See Code examples for Databricks Connect for Python.

  • udf, preconfigured as an alias for pyspark.sql.functions.udf, which is an alias for Python UDFs. See pyspark.sql.functions.udf.

  • sql, preconfigured as an alias for spark.sql. spark, as described earlier, represents a preconfigured instance of databricks.connect.DatabricksSession. See Spark SQL.

  • dbutils, preconfigured as an instance of Databricks Utilities, which is imported from databricks-sdk and is instantiated by getting Databricks authentication credentials from the extension. See Use Databricks Utilities.

    Note

    Only a subset of Databricks Utilities is supported for notebooks with Databricks Connect.

    To enable dbutils.widgets, you must first install the Databricks SDK for Python by running the following command in your local development machine’s terminal:

    pip install 'databricks-sdk[notebook]'
    
  • display, preconfigured as an alias for the Jupyter builtin IPython.display.display. See IPython.display.display.

  • displayHTML, preconfigured as an alias for dbruntime.display.displayHTML, which is an alias for display.HTML from ipython. See IPython.display.html.

The following notebook magics are also enabled:

  • %fs, which is the same as making dbutils.fs calls. See Mix languages.

  • %sh, which runs a command by using the cell magic %%script on the local machine. This does not run the command in the remote Databricks workspace. See Mix languages.

  • %md and %md-sandbox, which runs the cell magic %%markdown. See Mix languages.

  • %sql, which runs spark.sql. See Mix languages.

  • %pip, which runs pip install on the local machine. This does not run pip install in the remote Databricks workspace. See Manage libraries with %pip commands.

  • %run, which runs another notebook. This notebook magic is available in Databricks extension for Visual Studio Code version 1.1.2 and above. See Run a Databricks notebook from another notebook.

    Note

    To enable %run, you must first install the nbformat library by running the following command in your local development machine’s terminal:

    pip install nbformat
    
  • # MAGIC. This notebook magic is available in Databricks extension for Visual Studio Code version 1.1.2 and above.

Additional features that are enabled include:

  • Spark DataFrames are converted to pandas DataFrames, which are displayed in Jupyter table format.

Limitations include:

  • The notebooks magics %r and %scala are not supported and display an error if called. See Mix languages.

  • The notebook magic %sql does not support some DML commands, such as Show Tables.