Databricks extension for Visual Studio Code

Preview

This feature is in Public Preview.

The Databricks extension for Visual Studio Code enables you to connect to your remote Databricks workspaces from the Visual Studio Code integrated development environment (IDE) running on your local development machine. Through these connections, you can:

  • Synchronize local code that you develop in Visual Studio Code with code in your remote workspaces.

  • Run local Python code files from Visual Studio Code on Databricks clusters in your remote workspaces.

  • Run local Python code files (.py) and Python, R, Scala, and SQL notebooks (.py, .ipynb, .r, .scala, and .sql) from Visual Studio Code as automated Databricks jobs in your remote workspaces.

Note

The Databricks extension for Visual Studio Code supports running R, Scala, and SQL notebooks as automated jobs but does not provide any deeper support for these languages within Visual Studio Code.

Before you begin

Before you can use the Databricks extension for Visual Studio Code, your Databricks workspace and your local development machine must meet the following requirements.

Workspace requirements

You must have at least one Databricks workspace available, and the workspace must meet the following requirements:

  • The workspace must contain at least one Databricks cluster. If you do not have a cluster available, you can create a cluster now or after you install the Databricks extension for Visual Studio Code.

    Note

    Databricks SQL warehouses are not supported by this extension.

  • The workspace must be enabled for Files in Repos, regardless of whether you use workspace files locations or files in Databricks Repos, as described in the next bullet.

  • The Databricks extension for Visual Studio Code relies primarily on workspace files locations. See Set the workspace files location.

    Note

    The Databricks extension for Visual Studio Code also supports files in Databricks Repos within the Databricks workspace. However, Databricks only recommends using this feature if workspace files locations are not available to you. See Set the repository.

Local development machine requirements

You must have the following on your local development machine:

  • Visual Studio Code version 1.69.1 or higher. To view your installed version, click Code > About Visual Studio Code from the main menu on Linux or macOS and Help > About on Windows. To download, install, and configure Visual Studio Code, see Setting up Visual Studio Code.

  • Visual Studio Code must be configured for Python coding, including availability of a Python interpreter. For details, see Getting Started with Python in VS Code.

  • The Databricks extension for Visual Studio Code. See Install and open the extension.

Authentication requirements

The Databricks extension for Visual Studio Code implements portions of the Databricks client unified authentication standard, a consolidated and consistent architectural and programmatic approach to authentication. This approach helps make setting up and automating authentication with Databricks more centralized and predictable. It enables you to configure Databricks authentication once and then use that configuration across multiple Databricks tools and SDKs without further authentication configuration changes.

Before you can use the Databricks extension for Visual Studio Code, you must set up authentication between the Databricks extension for Visual Studio Code and your Databricks workspace. Depending on the type of authentication that you want to use, finish your setup by completing the following instructions in the specified order:

Getting started

Before you can use the Databricks extension for Visual Studio Code you must download, install, open, and configure the extension, as follows.

Install and open the extension

  1. In Visual Studio Code, open the Extensions view (View > Extensions from the main menu).

  2. In Search Extensions in Marketplace, enter Databricks.

  3. Click the Databricks entry.

    Note

    There are several entries with Databricks in their titles. Be sure to click the one with only Databricks in its title and a blue check mark icon next to Databricks.

  4. Click Install.

  5. Restart Visual Studio Code.

  6. Open the extension: on the sidebar, click the Databricks icon.

Configure the project

With the extension opened, open your code project’s folder in Visual Studio Code (File > Open Folder). If you do not have a code project then use PowerShell, your terminal for Linux or macOS, or Command Prompt for Windows, to create a folder, switch to the new folder, and then open Visual Studio Code from that folder. For example:

mkdir databricks-demo
cd databricks-demo
code .
md databricks-demo
cd databricks-demo
code .

Tip

If you get the error command not found: code, see Launching from the command line in the Visual Studio Code documentation.

Configure the extension

To use the extension, you must set the Databricks configuration profile for Databricks authentication. You must also set the cluster and repository.

Set up authentication with a configuration profile

With your project and the extension opened, do the following:

  1. In the Configuration pane, click Configure Databricks.

    Note

    If Configure Databricks is not visible, click the gear (Configure workspace) icon next to Configuration instead.

    Gear icon to configure workspace settings 1
  2. In the Command Palette, for Databricks Host, enter your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com. Then press Enter.

  3. Do one of the following:

    • If the Databricks extension for Visual Studio Code detects an existing matching Databricks configuration profile for the URL, you can select it in the list.

    • Click Edit Databricks profiles to open your Databricks configuration profiles file and create a configuration profile manually.

The extension creates a hidden folder in your project named .databricks if it does not already exist. The extension also creates in this folder a file named project.json if it does not already exist. This file contains the URL that you entered, along with some Databricks authentication details that the Databricks extension for Visual Studio Code needs to operate.

The extension also adds a hidden .gitignore file to the project if the file does not exist or if an existing .gitignore cannot be found in any parent folders. If a new .gitignore file is created, the extension adds a .databricks/ entry to this new file. If the extension finds an existing .gitignore file, it adds a .databricks/ entry to the existing file.

Set the cluster

With the extension and your code project opened, and a Databricks configuration profile already set, select an existing Databricks cluster that you want to use, or create a new Databricks cluster and use it.

Use an existing cluster

If you have an existing Databricks cluster that you want to use, do one of the following:

  • In the Clusters pane, do the following:

    1. Next to the cluster that you want to use, click the plug (Attach cluster) icon.

      Attach cluster icon 1

      Tip

      If the cluster is not visible in the Clusters pane, click the filter (Filter clusters) icon to see All clusters, clusters that are Created by me, or Running clusters. Or, click the arrowed circle (Refresh) icon next to the filter icon.

      Filter clusters icon 1

    The extension adds the cluster’s ID to your code project’s .databricks/project.json file, for example "clusterId": "1234-567890-abcd12e3".

    This procedure is complete.

  • In the Configuration pane, do the following:

    1. Next to Cluster, click the gear (Configure cluster) icon.

      Configure cluster icon 1
    2. In the Command Palette, click the cluster that you want to use.

    The extension adds the cluster’s ID to your code project’s .databricks/project.json file, for example "clusterId": "1234-567890-abcd12e3".

    This procedure is complete.

Create a new cluster

If you do not have an existing Databricks cluster, or you want to create a new one and use it, do the following:

  1. In the Configuration pane, next to Cluster, click the gear (Configure cluster) icon.

    Configure cluster icon 2
  2. In the Command Palette, click Create New Cluster.

  3. When prompted to open the external website (your Databricks workspace), click Open.

  4. If prompted, sign in to your Databricks workspace.

  5. Follow the instructions to create a cluster.

  6. After the cluster is created and is running, go back to Visual Studio Code.

  7. Do one of the following:

    • In the Clusters pane, next to the cluster that you want to use, click the plug (Attach cluster) icon.

      Attach cluster icon 2

      Tip

      If the cluster is not visible, click the filter (Filter clusters) icon to see All clusters, clusters that are Created by me, or Running clusters. Or, click the arrowed circle (Refresh) icon.

      Filter clusters icon 2

      The extension adds the cluster’s ID to the code project’s .databricks/project.json file, for example "clusterId": "1234-567890-abcd12e3".

      This procedure is complete.

    • In the Configuration pane, next to Cluster, click the gear (Configure cluster) icon.

      Configure cluster icon 3

      In the Command Palette, click the cluster that you want to use.

      The extension adds the cluster’s ID to the code project’s .databricks/project.json file, for example "clusterId": "1234-567890-abcd12e3".

Set the workspace files location

With the extension and your code project opened, and a Databricks configuration profile already set, use the Databricks extension for Visual Studio Code to create a new workspace files location and use it, or select an existing workspace files location instead.

Note

The Databricks extension for Visual Studio Code works only with workspace file locations that it creates. You cannot use an existing workspace files location in your workspace unless it was created by the extension.

To use workspace files locations with the Databricks extension for Visual Studio Code, you must use version 0.3.5 or higher of the extension, and your Databricks cluster must have Databricks Runtime 11.2 or higher installed.

To enable the Databricks extension for Visual Studio Code to use workspace files locations within a Databricks workspace, you must first set the extension’s Sync: Destination Type setting to workspace as follows:

  1. With the extension and your code project opened, and a Databricks configuration profile already set, in the Command Palette (View > Command Palette), type Preferences: Open User Settings, and then click Preferences: Open User Settings.

  2. On the User tab, expand Extensions, and click Databricks.

  3. For Sync: Destination Type, select workspace.

  4. Quit and restart Visual Studio Code.

Create a new workspace files location

To create a new workspace files location, do the following:

  1. In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.

    Configure sync destination icon 1
  2. In the Command Palette, click Create New Sync Destination.

  3. Type a folder name for the new workspace files location, and then press Enter.

    The extension creates a folder with the specified folder name within /Users/<your-username>/.ide in the workspace and then adds the workspace files location’s path to the code project’s .databricks/project.json file, for example "workspacePath": "/Users/<your-username>/.ide/<your-folder-name>".

    Note

    If the remote workspace files location’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.

  4. After you set the workspace files location, begin synchronizing with the workspace files location by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.

    Start synchronization icon 1

Important

The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related workspace files location in your remote Databricks workspace. These remote workspace files are intended to be transient. Do not initiate changes to these files from within your remote workspace, as these changes will not be synchronized back to your local project.

Reuse an existing workspace files location

If you have an existing workspace files location that you created earlier with the Databricks extension for Visual Studio Code and want to reuse in your current Visual Studio Code project, then do the following:

  1. In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.

    Configure sync destination icon 2
  2. In the Command Palette, select the workspace file location’s name from the list.

The extension adds the workspace files location’s path to the code project’s .databricks/project.json file, for example "workspacePath": "/Users/<your-username>/.ide/<your-folder-name>".

Note

If the remote workspace files location’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.

After you set the workspace files location, begin synchronizing with the workspace files location by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.

Start synchronization icon 2

Important

The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related workspace files location in your remote Databricks workspace. These remote workspace files are intended to be transient. Do not initiate changes to these files from within your remote workspace, as these changes will not be synchronized back to your local project.

Set the repository

Note

Databricks does not recommend that you use Databricks Repos with the Databricks extension for Visual Studio Code unless workspace files locations are unavailable to you. See Set the workspace files location.

If you choose to use a Databricks Repo instead of a workspace file location in your Databricks workspace, then with the extension and your code project opened, and a Databricks configuration profile already set, use the Databricks extension for Visual Studio Code to create a new repository in Databricks Repos and use it, or select an existing repository in Databricks Repos that you created earlier with the Databricks extension for Visual Studio Code and want to reuse instead.

Note

The Databricks extension for Visual Studio Code works only with repositories that it creates. You cannot use an existing repository in your workspace.

To enable the Databricks extension for Visual Studio Code to use repositories in Databricks Repos within a Databricks workspace, you must first set the extension’s Sync: Destination Type setting to repo as follows:

  1. With the extension and your code project opened, and a Databricks configuration profile already set, in the Command Palette (View > Command Palette), type Preferences: Open User Settings, and then click Preferences: Open User Settings.

  2. On the User tab, expand Extensions, and click Databricks.

  3. For Sync: Destination Type, select repo.

  4. Quit and restart Visual Studio Code.

Create a new repo

Note

Databricks does not recommend that you use Databricks Repos with the Databricks extension for Visual Studio Code unless workspace files locations are unavailable to you. See Set the workspace files location.

To create a new repository, do the following:

  1. In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.

    Configure sync destination icon 3
  2. In the Command Palette, click Create New Sync Destination.

  3. Type a name for the new repository in Databricks Repos, and then press Enter.

    The extension appends the characters .ide to the end of the repo’s name and then adds the repo’s workspace path to the code project’s .databricks/project.json file, for example "workspacePath": "/Workspace/Repos/someone@example.com/my-repo.ide".

    Note

    If the remote repo’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.

  4. After you set the repository, begin synchronizing with the repository by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.

    Start synchronization icon 3

Important

The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related repository in your remote Databricks workspace. These remote repository files are intended to be transient. Do not initiate changes to these files from within your remote repository, as these changes will not be synchronized back to your local project.

Reuse an existing repo

Note

Databricks does not recommend that you use Databricks Repos with the Databricks extension for Visual Studio Code unless workspace files locations are unavailable to you. See Set the workspace files location.

If you have an existing repository in Databricks Repos that you created earlier with the Databricks extension for Visual Studio Code and want to reuse in your current Visual Studio Code project, then do the following:

  1. In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.

    Configure sync destination icon 4
  2. In the Command Palette, select the repository’s name from the list.

    The extension adds the repo’s workspace path to the code project’s .databricks/project.json file, for example "workspacePath": "/Workspace/Repos/someone@example.com/my-repo.ide".

    Note

    If the remote repo’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.

  3. After you set the repository, begin synchronizing with the repository by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.

    Start synchronization icon 4

Important

The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related repository in your remote Databricks workspace. These remote repository files are intended to be transient. Do not initiate changes to these files from within your remote repository, as these changes will not be synchronized back to your local project.

Development tasks

After you configure the Databricks extension for Visual Studio Code, you can use the extension to run a local Python file on a cluster in a remote Databricks workspace, or run a local Python file or local Python, R, Scala, or SQL notebook as a job in a remote workspace, as follows.

If you do not have a local file or notebook available to test the Databricks extension for Visual Studio Code with, here is some basic code that you can add to your project:

from pyspark.sql import SparkSession
from pyspark.sql.types import *

spark = SparkSession.builder.getOrCreate()

schema = StructType([
  StructField('CustomerID', IntegerType(), False),
  StructField('FirstName',  StringType(),  False),
  StructField('LastName',   StringType(),  False)
])

data = [
  [ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ],
  [ 1001, 'Joost',   'van Brunswijk' ],
  [ 1002, 'Stan',    'Bokenkamp' ]
]

customers = spark.createDataFrame(data, schema)
customers.show()

# Output:
#
# +----------+---------+-------------------+
# |CustomerID|FirstName|           LastName|
# +----------+---------+-------------------+
# |      1000|  Mathijs|Oosterhout-Rijntjes|
# |      1001|    Joost|      van Brunswijk|
# |      1002|     Stan|          Bokenkamp|
# +----------+---------+-------------------+
# Databricks notebook source
from pyspark.sql.types import *

schema = StructType([
  StructField('CustomerID', IntegerType(), False),
  StructField('FirstName',  StringType(),  False),
  StructField('LastName',   StringType(),  False)
])

data = [
  [ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ],
  [ 1001, 'Joost',   'van Brunswijk' ],
  [ 1002, 'Stan',    'Bokenkamp' ]
]

customers = spark.createDataFrame(data, schema)
customers.show()

# Output:
#
# +----------+---------+-------------------+
# |CustomerID|FirstName|           LastName|
# +----------+---------+-------------------+
# |      1000|  Mathijs|Oosterhout-Rijntjes|
# |      1001|    Joost|      van Brunswijk|
# |      1002|     Stan|          Bokenkamp|
# +----------+---------+-------------------+
# Databricks notebook source
library(SparkR)

sparkR.session()

data <- list(
          list(1000L, "Mathijs", "Oosterhout-Rijntjes"),
          list(1001L, "Joost",   "van Brunswijk"),
          list(1002L, "Stan",    "Bokenkamp")
        )

schema <- structType(
            structField("CustomerID", "integer"),
            structField("FirstName",  "string"),
            structField("LastName",   "string")
          )

df <- createDataFrame(
        data   = data,
        schema = schema
      )

showDF(df)

# Output:
#
# +----------+---------+-------------------+
# |CustomerID|FirstName|           LastName|
# +----------+---------+-------------------+
# |      1000|  Mathijs|Oosterhout-Rijntjes|
# |      1001|    Joost|      van Brunswijk|
# |      1002|     Stan|          Bokenkamp|
# +----------+---------+-------------------+
// Databricks notebook source
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = StructType(Array(
  StructField("CustomerID", IntegerType, false),
  StructField("FirstName",  StringType, false),
  StructField("LastName",   StringType, false)
))

val data = List(
  Row(1000, "Mathijs", "Oosterhout-Rijntjes"),
  Row(1001, "Joost",   "van Brunswijk"),
  Row(1002, "Stan",    "Bokenkamp"),
)

val rdd = spark.sparkContext.makeRDD(data)
val customers = spark.createDataFrame(rdd, schema)

display(customers)

// Output:
//
// +----------+---------+-------------------+
// |CustomerID|FirstName|           LastName|
// +----------+---------+-------------------+
// |      1000|  Mathijs|Oosterhout-Rijntjes|
// |      1001|    Joost|      van Brunswijk|
// |      1002|     Stan|          Bokenkamp|
// +----------+---------+-------------------+
-- Databricks notebook source
CREATE TABLE IF NOT EXISTS zzz_customers(
  CustomerID INT,
  FirstName  STRING,
  LastName   STRING
);

-- COMMAND ----------
INSERT INTO zzz_customers VALUES
  (1000, "Mathijs", "Oosterhout-Rijntjes"),
  (1001, "Joost",   "van Brunswijk"),
  (1002, "Stan",    "Bokenkamp");

-- COMMAND ----------
SELECT * FROM zzz_customers;

-- Output:
--
-- +----------+---------+-------------------+
-- |CustomerID|FirstName|           LastName|
-- +----------+---------+-------------------+
-- |      1000|  Mathijs|Oosterhout-Rijntjes|
-- |      1001|    Joost|      van Brunswijk|
-- |      1002|     Stan|          Bokenkamp|
-- +----------+---------+-------------------+

-- COMMAND ----------
DROP TABLE zzz_customers;

Enable PySpark and Databricks Utilities code completion

To enable IntelliSense (also known as code completion) in the Visual Studio Code code editor for PySpark, Databricks Utilities, and related globals such as spark and dbutils, do the following with your code project opened:

  1. On the Command Palette (View > Command Palette), type Databricks: Configure autocomplete for Databricks globals and press Enter.

  2. Follow the on-screen prompts to allow the Databricks extension for Visual Studio Code to install PySpark for your project, and to add or modify the __builtins__.pyi file for your project to enable Databricks Utilities.

You can now use globals such as spark and dbutils in your code without declaring any related import statements beforehand.

Run or debug Python code with Databricks Connect

Note

This feature is Experimental.

Databricks Connect integration within the Databricks extension for Visual Studio Code supports only a portion of the Databricks client unified authentication standard. For more information, see Authentication requirements.

The Databricks extension for Visual Studio Code includes Databricks Connect. You can use Databricks Connect from within the Databricks extension for Visual Studio Code to run and do step-through debugging of individual Python (.py) files and Python Jupyter notebooks (.ipynb). The Databricks extension for Visual Studio Code includes Databricks Connect for Databricks Runtime 13.0 and higher. Earlier versions of Databricks Connect are not supported.

Databricks Connect requirements

Before you can use Databricks Connect from within the Databricks extension for Visual Studio Code, you must first meet the Databricks Connect requirements. These requirements include things such as a workspace enabled with Unity Catalog, a cluster running Databricks Runtime 13.0 or higher and with a cluster access mode of Single User or Shared, and a local version of Python installed with its major and minor versions matching those of Python installed on the cluster.

Step 1: Turn on the Databricks Connect feature

To enable the Databricks extension for Visual Studio Code to use Databricks Connect, you must turn on this feature in Visual Studio Code. To do this, open the Settings editor to the User tab, and then do the following:

  1. Expand Extensions, and then click Databricks.

  2. Next to Experiments: Opt Into, click Add Item.

  3. In the drop-down list, select debugging.dbconnect.

  4. Click OK.

  5. Reload Visual Studio Code, for example by running the >Developer: Reload Window command within the Command Palette (View > Command Palette).

Step 2: Create a Python virtual environment

Create and activate a Python virtual environment for your Python code project. Python virtual environments help to make sure that your code project is using compatible versions of Python and Python packages (in this case, the Databricks Connect package). The instructions and examples in this article use venv for Python virtual environments. To create a Python virtual environment using venv:

  1. From your Visual Studio Code terminal (View > Terminal) set to the root directory of your Python code project, instruct venv to use Python for the virtual environment, and then create the virtual environment’s supporting files in a hidden directory named .venv within the root directory of your Python code project, by running the following command:

    # Linux and macOS
    python3.10 -m venv ./.venv
    # Windows
    python3.10 -m venv .\.venv
    

    The preceding command uses Python 3.10, which matches the major and minor version of Python that Databricks Runtime 13.0 uses. Be sure to use the major and minor version of Python that matches your cluster’s installed version of Python.

  2. If Visual Studio Code displays the message “We noticed a new environment has been created. Do you want to select it for the workspace folder,” click Yes.

  3. Use venv to activate the virtual environment. See the venv documentation for the correct command to use, based on your operating system and terminal type. For example, on macOS running zsh:

    source ./.venv/bin/activate
    

    You will know that your virtual environment is activated when the virtual environment’s name (for example, .venv) displays in parentheses just before your terminal prompt.

    To deactivate the virtual environment at any time, run the command deactivate.

    You will know that your virtual environment is deactivated when the virtual environment’s name no longer displays in parentheses just before your terminal prompt.

Step 3: Update your Python code to establish a debugging context

To establish a debugging context between Databricks Connect and your cluster, your Python code must initialize the DatabricksSession class by calling DatabricksSession.builder.getOrCreate().

Note that you do not need to specify settings such as your workspace’s instance name, an access token, or your cluster’s ID and port number when you initialize the DatabricksSession class. Databricks Connect gets this information from the configuration details that you already provided through the Databricks extension for Visual Studio Code earlier in this article.

For additional information about initializing the DatabricksSession class, see the Databricks Connect code examples.

Step 4: Enable Databricks Connect

With the extension opened and the Workspace section configured for your code project, do the following:

  1. In the Visual Studio Code status bar, click the red Databricks Connect disabled button.

  2. If the Cluster section is not already configured in the extension, the following message appears: “Please attach a cluster to use Databricks Connect.” Click Attach Cluster and select a cluster that meets the Databricks Connect requirements.

  3. If the Cluster section is configured but the cluster is not compatible with Databricks Connect, click the red Databricks Connect disabled button, click Attach Cluster, and select a compatible cluster.

  4. If the Databricks Connect package is not already installed, the following message appears: “For interactive debugging and autocompletion you need Databricks Connect. Would you like to install it in the environment <environment-name>.” Click Install.

  5. In the Visual Studio Code status bar, the blue Databricks Connect enabled button appears.

    If the red Databricks Connect disabled button still appears, click it, and complete the on-screen instructions to get the blue Databricks Connect enabled button to appear.

  6. After the blue Databricks Connect enabled button appears, you are now ready to use Databricks Connect.

Note

You do not need to configure the extension’s Sync Destination section in order for your code project to use Databricks Connect.

Step 5: Run or debug your Python code

After you enable Databricks Connect for your code project, run or debug your Python file or notebook as follows.

To run or debug a Python (.py) file:

  1. In your code project, open the Python file that you want to run or debug.

  2. Set any debugging breakpoints within the Python file.

  3. In the file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, select Debug Python File. This choice supports step-through debugging, breakpoints, watch expressions, call stacks, and similar features. Other choices, which do not support debugging, include:

    • Run Python File to use Databricks Connect to run the file or notebook, but without debugging support.

    • Upload and Run File on Databricks to run the file on the cluster and display results within the IDE’s terminal. This choice does not use Databricks Connect to run the file.

    • Run File as Workflow on Databricks to run the file as an automated Databricks job within the workspace and display results within an editor in the IDE. This choice does not use Databricks Connect.

Run File on Databricks editor command 0

Note

The Run Current File in Interactive Window option, if available, attempts to run the file locally in a special Visual Studio Code interactive editor. Databricks does not recommend this option.

To run or debug a Python Jupyter notebook (.ipynb):

  1. In your code project, open the Python Jupyter notebook that you want to run or debug. Make sure the Python file is in Jupyter notebook format and has the extension .ipynb.

    Tip

    You can create a new Python Jupyter notebook by running the >Create: New Jupyter Notebook command from within the Command Palette.

  2. Click Run All Cells to run all cells without debugging, Execute Cell to run an individual corresponding cell without debugging, or Run by Line to run an individual cell line-by-line with limited debugging, with variable values displayed in the Jupyter panel (View > Open View > Jupyter).

    For full debugging within an individual cell, set breakpoints, and then click Debug Cell in the menu next to the cell’s Run button.

    After you click any of these options, you might be prompted to install missing Python Jupyter notebook package dependencies. Click to install.

    For more information, see Jupyter Notebooks in VS Code.

Run a Python file on a cluster

With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:

  1. In your code project, open the Python file that you want to run on the cluster.

  2. Do one of the following:

    • In Explorer view (View > Explorer), right-click the file, and then select Upload and Run File on Databricks from the context menu.

      Run File on Databricks context menu command
    • In the file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, click Upload and Run File on Databricks.

      Run File on Databricks editor command

The file runs on the cluster, and any output is printed to the Debug Console (View > Debug Console).

Run a Python file as a job

With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:

  1. In your code project, open the Python file that you want to run as a job.

  2. Do one of the following:

    • In Explorer view (View > Explorer), right-click the file, and then select Run File as Workflow on Databricks from the context menu.

      Run File as Workflow on Databricks context menu command 1
    • In the file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, click Run File as Workflow on Databricks.

      Run File as Workflow on Databricks editor command 1

A new editor tab appears, titled Databricks Job Run. The file runs as a job in the workspace, and any output is printed to the new editor tab’s Output area.

To view information about the job run, click the Task run ID link in the new Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.

Run a Python notebook as a job

With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:

  1. In your code project, open the Python notebook that you want to run as a job.

    Tip

    To create a Python notebook file in Visual Studio Code, begin by clicking File > New File, select Python File, and save the new file with a .py file extension.

    To turn the .py file into a Databricks notebook, add the special comment # Databricks notebook source to the beginning of the file, and add the special comment # COMMAND ---------- before each cell. For more information, see Import a file and convert it to a notebook.

    A Python code file formatted as a Databricks notebook1
  2. Do one of the following:

    • In Explorer view (View > Explorer), right-click the notebook file, and then select Run File as Workflow on Databricks from the context menu.

      Run File as Workflow on Databricks context menu command 1
    • In the notebook file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, click Run File as Workflow on Databricks.

      Run File as Workflow on Databricks editor command 2

A new editor tab appears, titled Databricks Job Run. The notebook runs as a job in the workspace, and the notebook and its output are displayed in the new editor tab’s Output area.

To view information about the job run, click the Task run ID link in the Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.

Run an R, Scala, or SQL notebook as a job

With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:

  1. In your code project, open the R, Scala, or SQL notebook that you want to run as a job.

    Tip

    To create an R, Scala, or SQL notebook file in Visual Studio Code, begin by clicking File > New File, select Python File, and save the new file with a .r, .scala, or .sql file extension, respectively.

    To turn the .r, .scala, or .sql file into a Databricks notebook, add the special comment Databricks notebook source to the beginning of the file and add the special comment COMMAND ---------- before each cell. Be sure to use the correct comment marker for each language (# for R, // for Scala, and -- for SQL). For more information, see Import a file and convert it to a notebook.

    This is similar to the pattern for Python notebooks:

    A Python code file formatted as a Databricks notebook 2
  2. In Run and Debug view (View > Run), select Run on Databricks as Workflow from the drop-down list, and then click the green play arrow (Start Debugging) icon.

    Run on Databricks as Workflow custom command

    Note

    If Run on Databricks as Workflow is not available, see Create a custom run configuration.

A new editor tab appears, titled Databricks Job Run. The notebook runs as a job in the workspace. The notebook and its output are displayed in the new editor tab’s Output area.

To view information about the job run, click the Task run ID link in the Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.

Advanced tasks

You can use the Databricks extension for Visual Studio Code to perform the following advanced tasks.

Run tests with pytest

You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest to test your functions that accept and return PySpark DataFrames in local memory. To get started with pytest and run it locally, see Get Started in the pytest documentation.

To run pytest on code in a remote Databricks workspace, do the following in your Visual Studio Code project:

Step 1: Create the tests

Add a Python file with the following code, which contains your tests to run. This example assumes that this file is named spark_test.py and is at the root of your Visual Studio Code project. This file contains a pytest fixture, which makes the cluster’s SparkSession (the entry point to Spark functionality on the cluster) available to the tests. This file contains a single test that checks whether the specified cell in the table contains the specified value. You can add your own tests to this file as needed.

from pyspark.sql import SparkSession
import pytest

@pytest.fixture
def spark() -> SparkSession:
  # Create a SparkSession (the entry point to Spark functionality) on
  # the cluster in the remote Databricks workspace. Unit tests do not
  # have access to this SparkSession by default.
  return SparkSession.builder.getOrCreate()

# Now add your unit tests.

# For example, here is a unit test that must be run on the
# cluster in the remote Databricks workspace.
# This example determines whether the specified cell in the
# specified table contains the specified value. For example,
# the third column in the first row should contain the word "Ideal":
#
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# |_c0 | carat | cut   | color | clarity | depth | table | price | x    | y     | z    |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# | 1  | 0.23  | Ideal | E     | SI2     | 61.5  | 55    | 326   | 3.95 | 3. 98 | 2.43 |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# ...
#
def test_spark(spark):
  spark.sql('USE default')
  data = spark.sql('SELECT * FROM diamonds')
  assert data.collect()[0][2] == 'Ideal'

Step 2: Create the pytest runner

Add a Python file with the following code, which instructs pytest to run your tests from the previous step. This example assumes that the file is named pytest_databricks.py and is at the root of your Visual Studio Code project.

import pytest
import os
import sys

# Run all tests in the connected repository in the remote Databricks workspace.
# By default, pytest searches through all files with filenames ending with
# "_test.py" for tests. Within each of these files, pytest runs each function
# with a function name beginning with "test_".

# Get the path to the repository for this file in the workspace.
repo_root = os.path.dirname(os.path.realpath(__file__))
# Switch to the repository's root directory.
os.chdir(repo_root)

# Skip writing .pyc files to the bytecode cache on the cluster.
sys.dont_write_bytecode = True

# Now run pytest from the repository's root directory, using the
# arguments that are supplied by your custom run configuration in
# your Visual Studio Code project. In this case, the custom run
# configuration JSON must contain these unique "program" and
# "args" objects:
#
# ...
# {
#   ...
#   "program": "${workspaceFolder}/path/to/this/file/in/workspace",
#   "args": ["/path/to/_test.py-files"]
# }
# ...
#
retcode = pytest.main(sys.argv[1:])

Step 3: Create a custom run configuration

To instruct pytest to run your tests, you must create a custom run configuration. Use the existing Databricks cluster-based run configuration to create your own custom run configuration, as follows:

  1. On the main menu, click Run > Add configuration.

  2. In the Command Palette, select Databricks.

    Visual Studio Code adds a .vscode/launch.json file to your project, if this file does not already exist.

  3. Change the starter run configuration as follows, and then save the file:

    • Change this run configuration’s name from Run on Databricks to some unique display name for this configuration, in this example Unit Tests (on Databricks).

    • Change program from ${file} to the path in the project that contains the test runner, in this example ${workspaceFolder}/pytest_databricks.py.

    • Change args from [] to the path in the project that contains the files with your tests, in this example ["."].

    Your launch.json file should look like this:

    {
      // Use IntelliSense to learn about possible attributes.
      // Hover to view descriptions of existing attributes.
      // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
      "version": "0.2.0",
      "configurations": [
        {
          "type": "databricks",
          "request": "launch",
          "name": "Unit Tests (on Databricks)",
          "program": "${workspaceFolder}/pytest_databricks.py",
          "args": ["."],
          "env": {}
        }
      ]
    }
    

Step 4: Run the tests

Make sure that pytest is already installed on the cluster first. For example, with the cluster’s settings page open in your Databricks workspace, do the following:

  1. On the Libraries tab, if pytest is visible, then pytest is already installed. If pytest is not visible, click Install new.

  2. For Library Source, click PyPI.

  3. For Package, enter pytest.

  4. Click Install.

  5. Wait until Status changes from Pending to Installed.

To run the tests, do the following from your Visual Studio Code project:

  1. On the main menu, click View > Run.

  2. In the Run and Debug list, click Unit Tests (on Databricks), if it is not already selected.

  3. Click the green arrow (Start Debugging) icon.

The pytest results display in the Debug Console (View > Debug Console on the main menu). For example, these results show that at least one test was found in the spark_test.py file, and a dot (.) means that a single test was found and passed. (A failing test would show an F.)

<date>, <time> - Creating execution context on cluster <cluster-id> ...
<date>, <time> - Synchronizing code to /Repos/<someone@example.com>/<your-repository-name> ...
<date>, <time> - Running /pytest_databricks.py ...
============================= test session starts ==============================
platform linux -- Python <version>, pytest-<version>, pluggy-<version>
rootdir: /Workspace/Repos/<someone@example.com>/<your-repository-name>
collected 1 item

spark_test.py .                                                          [100%]

============================== 1 passed in 3.25s ===============================
<date>, <time> - Done (took 10818ms)

Use environment variable definitions files

Visual Studio Code supports environment variable definitions files for Python projects. This enables you to create a file with the extension .env somewhere on your development machine, and Visual Studio Code will then apply the environment variables within this .env file at run time. For more information, see Environment variable definitions file in the Visual Studio Code documentation.

To have the Databricks extension for Visual Studio Code use your .env file, set databricks.python.envFile within your settings.json file or Extensions > Databricks > Python: Env File within the Settings editor to the absolute path of your .env file.

Important

If you set settings.json, do not set python.envFile to the absolute path of your .env file as described in the Visual Studio Code documentation, as the Databricks extension for Visual Studio Code must override python.envFile for its internal use. Be sure to only set databricks.python.envFile instead.

Create a custom run configuration

You can create custom run configurations in Visual Studio Code to do things such as passing custom arguments to a job or a notebook, or creating different run settings for different files. For example, the following custom run configuration passes the --prod argument to the job:

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "databricks-workflow",
      "request": "launch",
      "name": "Run on Databricks as Workflow",
      "program": "${file}",
      "parameters": {},
      "args": ["--prod"],
      "preLaunchTask": "databricks: sync"
    }
  ]
}

To create a custom run configuration, click Run > Add Configuration from the main menu in Visual Studio Code. Then select either Databricks for a cluster-based run configuration or Databricks: Workflow for a job-based run configuration.

By using custom run configurations, you can also pass in command-line arguments and run your code just by pressing F5. For more information, see Launch configurations in the Visual Studio Code documentation.

Uninstall the extension

You can uninstall the Databricks extension for Visual Studio Code if needed, as follows:

  1. In Visual Studio Code, click View > Extensions from the main menu.

  2. In the list of extensions, select the Databricks for Visual Studio Code entry.

  3. Click Uninstall.

  4. Click Reload required, or restart Visual Studio Code.

Troubleshooting

Error when synchronizing through a proxy

Issue: When you try to run the Databricks extension for Visual Studio Code to synchronize your local code project through a proxy, an error message similar to the following appears, and the synchronization operation is unsuccessful: Get "https://<workspace-instance>/api/2.0/preview/scim/v2/Me": EOF.

Possible cause: Visual Studio Code does not know how to find the proxy.

Recommended solution: Restart Visual Studio Code from your terminal by running the following command, and then try synchronizing again:

env HTTPS_PROXY=<proxy-url>:<port> code

In the preceding command:

  • Replace <proxy-url> with the full URL to your proxy.

  • Replace <port> with the correct port on your proxy.

Error: “spawn unknown system error -86” when you try to synchronize local code

Issue: When you try to synchronize local code in a project to a remote Databricks workspace, the Terminal shows that synchronization has started but displays only the error message spawn unknown system error -86. Also, the Sync Destination section of the Configuration pane remains in a pending state.

Possible cause: The wrong version of the Databricks extension for Visual Studio Code is installed for your development machine’s operating system.

Recommend solution: Uninstall the extension, and then Install and open the extension for your development machine’s operating system from the beginning.

Send usage logs to Databricks

If you have issues synchronizing local code to a remote Databricks workspace, you can send usage logs and related information to Databricks Support by doing the following:

  1. Turn on verbose mode for the Databricks command-line interface (CLI) by checking the Bricks: Verbose Mode setting, or setting databricks.bricks.verboseMode to true, as described in Settings.

  2. Also turn on logging by checking the Logs: Enabled setting, or setting databricks.logs.enabled to true, as described in Settings. Be sure to restart Visual Studio Code after you turn on logging.

  3. Attempt to reproduce your issue.

  4. From the Command Palette (View > Command Palette from the main menu), run the Databricks: Open full logs command.

  5. Send the bricks-logs.json and sdk-and-extension-logs.json files that appear to Databricks Support.

  6. Also copy the contents of the Terminal (View > Terminal) in the context of the issue, and send this content to Databricks Support.

To send error logs that are not about code synchronization issues to Databricks Support:

  1. From the Command Palette (View > Command Palette), run the Databricks: Open full logs command.

  2. Send only the sdk-and-extension-logs.json file that appears to Databricks Support.

The Output view (View > Output, Databricks Logs) shows truncated information if Logs: Enabled is checked or databricks.logs.enabled is set to true. To show more information, change the following settings, as described in Settings:

  • Logs: Max Array Length or databricks.logs.maxArrayLength

  • Logs: Max Field Length or databricks.logs.maxFieldLength

  • Logs: Truncation Depth or databricks.logs.truncationDepth

Command Palette

The Databricks extension for Visual Studio Code adds the following commands to the Visual Studio Code Command Palette. See also Command Palette in the Visual Studio Code documentation.

Command

Description

Databricks: Configure autocomplete for Databricks globals

Enables IntelliSense in the Visual Studio Code code editor for PySpark, Databricks Utilities, and related globals such as spark and dbutils. See Enable PySpark and Databricks Utilities code completion.

Databricks: Configure cluster

Moves focus to the Command Palette to create, select, or change the Databricks cluster to use for the current project. See Set the cluster.

Databricks: Configure sync destination

Moves focus to the Command Palette to create, select, or change the repository in Databricks Repos to use for the current project. See Set the repository.

Databricks: Configure workspace

Moves focus to the Command Palette to create, select, or change Databricks authentication details to use for the current project. See Set up authentication with a configuration profile.

Databricks: Create Folder

Creates a new sync destination.

Databricks: Detach cluster

Removes the reference to the Databricks cluster from the current project.

Databricks: Detach sync destination

Removes the reference to the repository in Databricks Repo from the current project.

Databricks: Focus on Clusters View

Moves focus in the Databricks view to the Clusters pane.

Databricks: Focus on Configuration View

Moves focus in the Databricks view to the Configuration pane.

Databricks: Focus on Workspace Explorer View

Moves focus in the Databricks view to the Workspace Explorer pane.

Databricks: Logout

Resets the Databricks view to show the Configure Databricks and Show Quickstart buttons in the Configuration pane. Any content in the current project’s .databricks/project.json file is also reset. See Configure the extension.

Databricks: Open Databricks configuration file

Opens the Databricks configuration profiles file, from the default location, for the current project. See Set up authentication with a configuration profile.

Databricks: Open full logs

Opens the folder that contains the application log files that the Databricks extension for Visual Studio Code writes to your development machine.

Databricks: Refresh workspace filesystem view

Refreshes the Workspace Explorer pane in the Databricks view.

Databricks: Run File as Workflow on Databricks

Runs a Python file on the cluster.

Databricks: Show Quickstart

Shows the Quickstart file in the editor.

Databricks: Start cluster

Starts the cluster if it is already stopped.

Databricks: Start synchronization

Starts synchronizing the current project’s code to the Databricks workspace. This command performs an incremental synchronization.

Databricks: Start synchronization (full sync)

Starts synchronizing the current project’s code to the Databricks workspace. This command performs a full synchronization, even if an incremental sync is possible.

Databricks: Stop cluster

Stops the cluster if it is already running.

Databricks: Stop synchronization

Stops synchronizing the current project’s code to the Databricks workspace.

Databricks: Upload and Run File on Databricks

Runs a Python file or a notebook as an automated Databricks job within the workspace.

Settings

The Databricks extension for Visual Studio Code adds the following settings to Visual Studio Code. See also Settings editor and settings.json in the Visual Studio Code documentation.

Settings editor (Extensions > Databricks)

settings.json

Description

Bricks: Verbose Mode

databricks.bricks.verboseMode

Checked or set to true to enable verbose logging for the Databricks command-line interface

Checked or set to true to enable verbose logging for the Databricks command-line interface (CLI) when it synchronizes local code with code in your remote workspace. The default is unchecked or false (do not enable verbose logging for the Databricks CLI).

Clusters: Only Show Accessible Clusters

databricks.clusters.onlyShowAccessibleClusters

Checked or set to true to enable filtering for only those clusters that you can run code on. The default is unchecked or false (do not enable filtering for those clusters).

Logs: Enabled

databricks.logs.enabled

Checked or set to true (default) to enable logging. Reload your window for any change to take effect.

Logs: Max Array Length

databricks.logs.maxArrayLength

The maximum number of items to show for array fields. The default is 2.

Logs: Max Field Length

databricks.logs.maxFieldLength

The maximum length of each field displayed in the logs output panel. The default is 40.

Logs: Truncation Depth

databricks.logs.truncationDepth

The maximum depth of logs to show without truncation. The default is 2.

Override Databricks Config File

databricks.overrideDatabricksConfigFile

An alternate location for the .databrickscfg file that the extension uses for authentication.

Python: Env File

databricks.python.envFile

The absolute path to your custom Python environment variable definitions (.env) file.

Sync: Destination Type

databricks.sync.destinationType

Whether to use a folder in the workspace (workspace) or a repository in Databricks Repos in the workspace (repo, default) as the sync destination.

Setting this to workspace displays the Workspace Explorer pane, which enables you to browse available sync destinations within the workspace. This behavior works only with workspaces that are enabled with the ability to create arbitrary files within the workspace, and the selected cluster must have Databricks Runtime 11.2 or higher installed. See What are workspace files?.

Reload your window for any change to take effect.

Frequently asked questions (FAQs)

Do you have support for, or a timeline for support for, any of the following capabilities?

  • Other languages, such as Scala or SQL

  • Delta Live Tables

  • Databricks SQL warehouses

  • Other IDEs, such as PyCharm

  • Additional libraries

  • Full CI/CD integration

  • Authentication schemes in addition to Databricks personal access

Databricks is aware of these requests and is prioritizing work to enable simple scenarios for local development and remote running of code. Please forward additional requests and scenarios to your Databricks representative. Databricks will incorporate your input into future planning.

How does the Databricks Terraform provider relate to the Databricks extension for Visual Studio Code?

Databricks continues to recommend the Databricks Terraform provider for managing your CI/CD pipelines in a predictable way. Please let your Databricks representative know how you might use an IDE to manage your deployments in the future. Databricks will incorporate your input into future planning.

How does dbx by Databricks Labs relate to the Databricks extension for Visual Studio Code?

The main features of dbx by Databricks Labs include:

  • Project scaffolding.

  • Limited local development through the dbx execute command.

  • CI/CD for Databricks jobs.

The Databricks extension for Visual Studio Code enables local development and remotely running Python code files on Databricks clusters, and remotely running Python code files and notebooks in Databricks jobs. dbx can continue to be used for project scaffolding and CI/CD for Databricks jobs.

What happens if I already have an existing Databricks configuration profile that I created through the Databricks CLI?

You can select your existing configuration profile when you configure the Databricks extension for Visual Studio Code. With the extension and your code project opened, do the following:

  1. In the Configuration pane, click the gear (Configure workspace) icon.

    Gear icon to configure workspace settings 4
  2. Enter your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com.

  3. In the Command Palette, select your existing configuration profile.

Which permissions do I need for a Databricks workspace to use the Databricks extension for Visual Studio Code?

You must have execute permissions for a Databricks cluster for running code, as well as permissions to create a repository in Databricks Repos.

Which settings must be enabled for a Databricks workspace to use the Databricks extension for Visual Studio Code?

The workspace must have the Files in Repos setting turned on. For instructions, see Configure support for Files in Repos. If you cannot turn on this setting yourself, contact your Databricks workspace administrator.

Can I use the Databricks extension for Visual Studio Code with an existing repository stored with a remote Git provider?

No. The Databricks extension for Visual Studio Code works only with repositories that it creates.