Databricks extension for Visual Studio Code
Preview
This feature is in Public Preview.
The Databricks extension for Visual Studio Code enables you to connect to your remote Databricks workspaces from the Visual Studio Code integrated development environment (IDE) running on your local development machine. Through these connections, you can:
Synchronize local code that you develop in Visual Studio Code with code in your remote workspaces.
Run local Python code files from Visual Studio Code on Databricks clusters in your remote workspaces.
Run local Python code files (
.py
) and Python, R, Scala, and SQL notebooks (.py
,.ipynb
,.r
,.scala
, and.sql
) from Visual Studio Code as automated Databricks jobs in your remote workspaces.
Note
The Databricks extension for Visual Studio Code supports running R, Scala, and SQL notebooks as automated jobs but does not provide any deeper support for these languages within Visual Studio Code.
Before you begin
Before you can use the Databricks extension for Visual Studio Code, your Databricks workspace and your local development machine must meet the following requirements.
Workspace requirements
You must have at least one Databricks workspace available, and the workspace must meet the following requirements:
The workspace must contain at least one Databricks cluster. If you do not have a cluster available, you can create a cluster now or after you install the Databricks extension for Visual Studio Code.
Note
Databricks SQL warehouses are not supported by this extension.
The workspace must be enabled for Files in Repos, regardless of whether you use workspace files locations or files in Databricks Repos, as described in the next bullet.
The Databricks extension for Visual Studio Code relies primarily on workspace files locations. See Set the workspace files location.
Note
The Databricks extension for Visual Studio Code also supports files in Databricks Repos within the Databricks workspace. However, Databricks only recommends using this feature if workspace files locations are not available to you. See Set the repository.
Local development machine requirements
You must have the following on your local development machine:
Visual Studio Code version 1.69.1 or higher. To view your installed version, click Code > About Visual Studio Code from the main menu on Linux or macOS and Help > About on Windows. To download, install, and configure Visual Studio Code, see Setting up Visual Studio Code.
Visual Studio Code must be configured for Python coding, including availability of a Python interpreter. For details, see Getting Started with Python in VS Code.
The Databricks extension for Visual Studio Code. See Install and open the extension.
Authentication requirements
The Databricks extension for Visual Studio Code implements portions of the Databricks client unified authentication standard, a consolidated and consistent architectural and programmatic approach to authentication. This approach helps make setting up and automating authentication with Databricks more centralized and predictable. It enables you to configure Databricks authentication once and then use that configuration across multiple Databricks tools and SDKs without further authentication configuration changes.
Before you can use the Databricks extension for Visual Studio Code, you must set up authentication between the Databricks extension for Visual Studio Code and your Databricks workspace. Depending on the type of authentication that you want to use, finish your setup by completing the following instructions in the specified order:
-
Create or identify an access token, as specified in Token authentication.
Create or identify a Databricks configuration profile with the required fields, as specified in Token authentication.
Continue with getting started.
Finish setting up authentication by continuing with Set up authentication with a configuration profile.
Finish setting up your project by setting the cluster and setting the workspace files location.
Google Cloud credentials authentication
The Databricks extension for Visual Studio Code does not support Google Cloud credentials authentication.
Note
Databricks Connect supports Google Cloud credentials authentication. However, you cannot use the Databricks Connect integration within the Databricks extension for Visual Studio Code to do Google Cloud credentials authentication. To use Databricks Connect with Visual Studio Code by itself, separate from the Databricks extension for Visual Studio Code, see Visual Studio Code with Python.
Google Cloud ID authentication
The Databricks extension for Visual Studio Code does not support Google Cloud ID authentication.
Note
Databricks Connect supports Google Cloud ID authentication. However, you cannot use the Databricks Connect integration within the Databricks extension for Visual Studio Code to do Google Cloud ID authentication. To use Databricks Connect with Visual Studio Code by itself, separate from the Databricks extension for Visual Studio Code, see Visual Studio Code with Python.
Getting started
Before you can use the Databricks extension for Visual Studio Code you must download, install, open, and configure the extension, as follows.
Install and open the extension
In Visual Studio Code, open the Extensions view (View > Extensions from the main menu).
In Search Extensions in Marketplace, enter Databricks.
Click the Databricks entry.
Note
There are several entries with Databricks in their titles. Be sure to click the one with only Databricks in its title and a blue check mark icon next to Databricks.
Click Install.
Restart Visual Studio Code.
Open the extension: on the sidebar, click the Databricks icon.
Configure the project
With the extension opened, open your code project’s folder in Visual Studio Code (File > Open Folder). If you do not have a code project then use PowerShell, your terminal for Linux or macOS, or Command Prompt for Windows, to create a folder, switch to the new folder, and then open Visual Studio Code from that folder. For example:
mkdir databricks-demo
cd databricks-demo
code .
md databricks-demo
cd databricks-demo
code .
Tip
If you get the error command not found: code
, see Launching from the command line in the Visual Studio Code documentation.
Configure the extension
To use the extension, you must set the Databricks configuration profile for Databricks authentication. You must also set the cluster and repository.
Set up authentication with a configuration profile
With your project and the extension opened, do the following:
In the Configuration pane, click Configure Databricks.
Note
If Configure Databricks is not visible, click the gear (Configure workspace) icon next to Configuration instead.
In the Command Palette, for Databricks Host, enter your workspace instance URL, for example
https://1234567890123456.7.gcp.databricks.com
. Then press Enter.Do one of the following:
If the Databricks extension for Visual Studio Code detects an existing matching Databricks configuration profile for the URL, you can select it in the list.
Click Edit Databricks profiles to open your Databricks configuration profiles file and create a configuration profile manually.
The extension creates a hidden folder in your project named .databricks
if it does not already exist. The extension also creates in this folder a file named project.json
if it does not already exist. This file contains the URL that you entered, along with some Databricks authentication details that the Databricks extension for Visual Studio Code needs to operate.
The extension also adds a hidden .gitignore
file to the project if the file does not exist or if an existing .gitignore
cannot be found in any parent folders. If a new .gitignore
file is created, the extension adds a .databricks/
entry to this new file. If the extension finds an existing .gitignore
file, it adds a .databricks/
entry to the existing file.
Set the cluster
With the extension and your code project opened, and a Databricks configuration profile already set, select an existing Databricks cluster that you want to use, or create a new Databricks cluster and use it.
Use an existing cluster
If you have an existing Databricks cluster that you want to use, do one of the following:
In the Clusters pane, do the following:
Next to the cluster that you want to use, click the plug (Attach cluster) icon.
Tip
If the cluster is not visible in the Clusters pane, click the filter (Filter clusters) icon to see All clusters, clusters that are Created by me, or Running clusters. Or, click the arrowed circle (Refresh) icon next to the filter icon.
The extension adds the cluster’s ID to your code project’s
.databricks/project.json
file, for example"clusterId": "1234-567890-abcd12e3"
.This procedure is complete.
In the Configuration pane, do the following:
Next to Cluster, click the gear (Configure cluster) icon.
In the Command Palette, click the cluster that you want to use.
The extension adds the cluster’s ID to your code project’s
.databricks/project.json
file, for example"clusterId": "1234-567890-abcd12e3"
.This procedure is complete.
Create a new cluster
If you do not have an existing Databricks cluster, or you want to create a new one and use it, do the following:
In the Configuration pane, next to Cluster, click the gear (Configure cluster) icon.
In the Command Palette, click Create New Cluster.
When prompted to open the external website (your Databricks workspace), click Open.
If prompted, sign in to your Databricks workspace.
Follow the instructions to create a cluster.
After the cluster is created and is running, go back to Visual Studio Code.
Do one of the following:
In the Clusters pane, next to the cluster that you want to use, click the plug (Attach cluster) icon.
Tip
If the cluster is not visible, click the filter (Filter clusters) icon to see All clusters, clusters that are Created by me, or Running clusters. Or, click the arrowed circle (Refresh) icon.
The extension adds the cluster’s ID to the code project’s
.databricks/project.json
file, for example"clusterId": "1234-567890-abcd12e3"
.This procedure is complete.
In the Configuration pane, next to Cluster, click the gear (Configure cluster) icon.
In the Command Palette, click the cluster that you want to use.
The extension adds the cluster’s ID to the code project’s
.databricks/project.json
file, for example"clusterId": "1234-567890-abcd12e3"
.
Set the workspace files location
With the extension and your code project opened, and a Databricks configuration profile already set, use the Databricks extension for Visual Studio Code to create a new workspace files location and use it, or select an existing workspace files location instead.
Note
The Databricks extension for Visual Studio Code works only with workspace file locations that it creates. You cannot use an existing workspace files location in your workspace unless it was created by the extension.
To use workspace files locations with the Databricks extension for Visual Studio Code, you must use version 0.3.5 or higher of the extension, and your Databricks cluster must have Databricks Runtime 11.2 or higher installed.
To enable the Databricks extension for Visual Studio Code to use workspace files locations within a Databricks workspace, you must first set the extension’s Sync: Destination Type setting to workspace as follows:
With the extension and your code project opened, and a Databricks configuration profile already set, in the Command Palette (View > Command Palette), type
Preferences: Open User Settings
, and then click Preferences: Open User Settings.On the User tab, expand Extensions, and click Databricks.
For Sync: Destination Type, select workspace.
Quit and restart Visual Studio Code.
Create a new workspace files location
To create a new workspace files location, do the following:
In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.
In the Command Palette, click Create New Sync Destination.
Type a folder name for the new workspace files location, and then press Enter.
The extension creates a folder with the specified folder name within
/Users/<your-username>/.ide
in the workspace and then adds the workspace files location’s path to the code project’s.databricks/project.json
file, for example"workspacePath": "/Users/<your-username>/.ide/<your-folder-name>"
.Note
If the remote workspace files location’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.
After you set the workspace files location, begin synchronizing with the workspace files location by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.
Important
The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related workspace files location in your remote Databricks workspace. These remote workspace files are intended to be transient. Do not initiate changes to these files from within your remote workspace, as these changes will not be synchronized back to your local project.
Reuse an existing workspace files location
If you have an existing workspace files location that you created earlier with the Databricks extension for Visual Studio Code and want to reuse in your current Visual Studio Code project, then do the following:
In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.
In the Command Palette, select the workspace file location’s name from the list.
The extension adds the workspace files location’s path to the code project’s .databricks/project.json
file, for example "workspacePath": "/Users/<your-username>/.ide/<your-folder-name>"
.
Note
If the remote workspace files location’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.
After you set the workspace files location, begin synchronizing with the workspace files location by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.

Important
The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related workspace files location in your remote Databricks workspace. These remote workspace files are intended to be transient. Do not initiate changes to these files from within your remote workspace, as these changes will not be synchronized back to your local project.
Set the repository
Note
Databricks does not recommend that you use Databricks Repos with the Databricks extension for Visual Studio Code unless workspace files locations are unavailable to you. See Set the workspace files location.
If you choose to use a Databricks Repo instead of a workspace file location in your Databricks workspace, then with the extension and your code project opened, and a Databricks configuration profile already set, use the Databricks extension for Visual Studio Code to create a new repository in Databricks Repos and use it, or select an existing repository in Databricks Repos that you created earlier with the Databricks extension for Visual Studio Code and want to reuse instead.
Note
The Databricks extension for Visual Studio Code works only with repositories that it creates. You cannot use an existing repository in your workspace.
To enable the Databricks extension for Visual Studio Code to use repositories in Databricks Repos within a Databricks workspace, you must first set the extension’s Sync: Destination Type setting to repo as follows:
With the extension and your code project opened, and a Databricks configuration profile already set, in the Command Palette (View > Command Palette), type
Preferences: Open User Settings
, and then click Preferences: Open User Settings.On the User tab, expand Extensions, and click Databricks.
For Sync: Destination Type, select repo.
Quit and restart Visual Studio Code.
Create a new repo
Note
Databricks does not recommend that you use Databricks Repos with the Databricks extension for Visual Studio Code unless workspace files locations are unavailable to you. See Set the workspace files location.
To create a new repository, do the following:
In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.
In the Command Palette, click Create New Sync Destination.
Type a name for the new repository in Databricks Repos, and then press Enter.
The extension appends the characters
.ide
to the end of the repo’s name and then adds the repo’s workspace path to the code project’s.databricks/project.json
file, for example"workspacePath": "/Workspace/Repos/someone@example.com/my-repo.ide"
.Note
If the remote repo’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.
After you set the repository, begin synchronizing with the repository by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.
Important
The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related repository in your remote Databricks workspace. These remote repository files are intended to be transient. Do not initiate changes to these files from within your remote repository, as these changes will not be synchronized back to your local project.
Reuse an existing repo
Note
Databricks does not recommend that you use Databricks Repos with the Databricks extension for Visual Studio Code unless workspace files locations are unavailable to you. See Set the workspace files location.
If you have an existing repository in Databricks Repos that you created earlier with the Databricks extension for Visual Studio Code and want to reuse in your current Visual Studio Code project, then do the following:
In the Configuration pane, next to Sync Destination, click the gear (Configure sync destination) icon.
In the Command Palette, select the repository’s name from the list.
The extension adds the repo’s workspace path to the code project’s
.databricks/project.json
file, for example"workspacePath": "/Workspace/Repos/someone@example.com/my-repo.ide"
.Note
If the remote repo’s name does not match your local code project’s name, a warning icon appears with this message: The remote sync destination name does not match the current Visual Studio Code workspace name. You can ignore this warning if you do not require the names to match.
After you set the repository, begin synchronizing with the repository by clicking the arrowed circle (Start synchronization) icon next to Sync Destination.
Important
The Databricks extension for Visual Studio Code only performs one-way, automatic synchronization of file changes from your local Visual Studio Code project to the related repository in your remote Databricks workspace. These remote repository files are intended to be transient. Do not initiate changes to these files from within your remote repository, as these changes will not be synchronized back to your local project.
Development tasks
After you configure the Databricks extension for Visual Studio Code, you can use the extension to run a local Python file on a cluster in a remote Databricks workspace, or run a local Python file or local Python, R, Scala, or SQL notebook as a job in a remote workspace, as follows.
If you do not have a local file or notebook available to test the Databricks extension for Visual Studio Code with, here is some basic code that you can add to your project:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
spark = SparkSession.builder.getOrCreate()
schema = StructType([
StructField('CustomerID', IntegerType(), False),
StructField('FirstName', StringType(), False),
StructField('LastName', StringType(), False)
])
data = [
[ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ],
[ 1001, 'Joost', 'van Brunswijk' ],
[ 1002, 'Stan', 'Bokenkamp' ]
]
customers = spark.createDataFrame(data, schema)
customers.show()
# Output:
#
# +----------+---------+-------------------+
# |CustomerID|FirstName| LastName|
# +----------+---------+-------------------+
# | 1000| Mathijs|Oosterhout-Rijntjes|
# | 1001| Joost| van Brunswijk|
# | 1002| Stan| Bokenkamp|
# +----------+---------+-------------------+
# Databricks notebook source
from pyspark.sql.types import *
schema = StructType([
StructField('CustomerID', IntegerType(), False),
StructField('FirstName', StringType(), False),
StructField('LastName', StringType(), False)
])
data = [
[ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ],
[ 1001, 'Joost', 'van Brunswijk' ],
[ 1002, 'Stan', 'Bokenkamp' ]
]
customers = spark.createDataFrame(data, schema)
customers.show()
# Output:
#
# +----------+---------+-------------------+
# |CustomerID|FirstName| LastName|
# +----------+---------+-------------------+
# | 1000| Mathijs|Oosterhout-Rijntjes|
# | 1001| Joost| van Brunswijk|
# | 1002| Stan| Bokenkamp|
# +----------+---------+-------------------+
# Databricks notebook source
library(SparkR)
sparkR.session()
data <- list(
list(1000L, "Mathijs", "Oosterhout-Rijntjes"),
list(1001L, "Joost", "van Brunswijk"),
list(1002L, "Stan", "Bokenkamp")
)
schema <- structType(
structField("CustomerID", "integer"),
structField("FirstName", "string"),
structField("LastName", "string")
)
df <- createDataFrame(
data = data,
schema = schema
)
showDF(df)
# Output:
#
# +----------+---------+-------------------+
# |CustomerID|FirstName| LastName|
# +----------+---------+-------------------+
# | 1000| Mathijs|Oosterhout-Rijntjes|
# | 1001| Joost| van Brunswijk|
# | 1002| Stan| Bokenkamp|
# +----------+---------+-------------------+
// Databricks notebook source
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
val schema = StructType(Array(
StructField("CustomerID", IntegerType, false),
StructField("FirstName", StringType, false),
StructField("LastName", StringType, false)
))
val data = List(
Row(1000, "Mathijs", "Oosterhout-Rijntjes"),
Row(1001, "Joost", "van Brunswijk"),
Row(1002, "Stan", "Bokenkamp"),
)
val rdd = spark.sparkContext.makeRDD(data)
val customers = spark.createDataFrame(rdd, schema)
display(customers)
// Output:
//
// +----------+---------+-------------------+
// |CustomerID|FirstName| LastName|
// +----------+---------+-------------------+
// | 1000| Mathijs|Oosterhout-Rijntjes|
// | 1001| Joost| van Brunswijk|
// | 1002| Stan| Bokenkamp|
// +----------+---------+-------------------+
-- Databricks notebook source
CREATE TABLE IF NOT EXISTS zzz_customers(
CustomerID INT,
FirstName STRING,
LastName STRING
);
-- COMMAND ----------
INSERT INTO zzz_customers VALUES
(1000, "Mathijs", "Oosterhout-Rijntjes"),
(1001, "Joost", "van Brunswijk"),
(1002, "Stan", "Bokenkamp");
-- COMMAND ----------
SELECT * FROM zzz_customers;
-- Output:
--
-- +----------+---------+-------------------+
-- |CustomerID|FirstName| LastName|
-- +----------+---------+-------------------+
-- | 1000| Mathijs|Oosterhout-Rijntjes|
-- | 1001| Joost| van Brunswijk|
-- | 1002| Stan| Bokenkamp|
-- +----------+---------+-------------------+
-- COMMAND ----------
DROP TABLE zzz_customers;
Enable PySpark and Databricks Utilities code completion
To enable IntelliSense (also known as code completion) in the Visual Studio Code code editor for PySpark, Databricks Utilities, and related globals such as spark
and dbutils
, do the following with your code project opened:
On the Command Palette (View > Command Palette), type
Databricks: Configure autocomplete for Databricks globals
and press Enter.Follow the on-screen prompts to allow the Databricks extension for Visual Studio Code to install PySpark for your project, and to add or modify the
__builtins__.pyi
file for your project to enable Databricks Utilities.
You can now use globals such as spark
and dbutils
in your code without declaring any related import
statements beforehand.
Run or debug Python code with Databricks Connect
Note
This feature is Experimental.
Databricks Connect integration within the Databricks extension for Visual Studio Code supports only a portion of the Databricks client unified authentication standard. For more information, see Authentication requirements.
The Databricks extension for Visual Studio Code includes Databricks Connect. You can use Databricks Connect from within the Databricks extension for Visual Studio Code to run and do step-through debugging of individual Python (.py
) files and Python Jupyter notebooks (.ipynb
). The Databricks extension for Visual Studio Code includes Databricks Connect for Databricks Runtime 13.0 and higher. Earlier versions of Databricks Connect are not supported.
Databricks Connect requirements
Before you can use Databricks Connect from within the Databricks extension for Visual Studio Code, you must first meet the Databricks Connect requirements. These requirements include things such as a workspace enabled with Unity Catalog, a cluster running Databricks Runtime 13.0 or higher and with a cluster access mode of Single User or Shared, and a local version of Python installed with its major and minor versions matching those of Python installed on the cluster.
Step 1: Turn on the Databricks Connect feature
To enable the Databricks extension for Visual Studio Code to use Databricks Connect, you must turn on this feature in Visual Studio Code. To do this, open the Settings editor to the User tab, and then do the following:
Expand Extensions, and then click Databricks.
Next to Experiments: Opt Into, click Add Item.
In the drop-down list, select debugging.dbconnect.
Click OK.
Reload Visual Studio Code, for example by running the >Developer: Reload Window command within the Command Palette (View > Command Palette).
Step 2: Create a Python virtual environment
Create and activate a Python virtual environment for your Python code project. Python virtual environments help to make sure that your code project is using compatible versions of Python and Python packages (in this case, the Databricks Connect package). The instructions and examples in this article use venv for Python virtual environments. To create a Python virtual environment using venv
:
From your Visual Studio Code terminal (View > Terminal) set to the root directory of your Python code project, instruct
venv
to use Python for the virtual environment, and then create the virtual environment’s supporting files in a hidden directory named.venv
within the root directory of your Python code project, by running the following command:# Linux and macOS python3.10 -m venv ./.venv # Windows python3.10 -m venv .\.venv
The preceding command uses Python 3.10, which matches the major and minor version of Python that Databricks Runtime 13.0 uses. Be sure to use the major and minor version of Python that matches your cluster’s installed version of Python.
If Visual Studio Code displays the message “We noticed a new environment has been created. Do you want to select it for the workspace folder,” click Yes.
Use
venv
to activate the virtual environment. See the venv documentation for the correct command to use, based on your operating system and terminal type. For example, on macOS runningzsh
:source ./.venv/bin/activate
You will know that your virtual environment is activated when the virtual environment’s name (for example,
.venv
) displays in parentheses just before your terminal prompt.To deactivate the virtual environment at any time, run the command
deactivate
.You will know that your virtual environment is deactivated when the virtual environment’s name no longer displays in parentheses just before your terminal prompt.
Step 3: Update your Python code to establish a debugging context
To establish a debugging context between Databricks Connect and your cluster, your Python code must initialize the DatabricksSession
class by calling DatabricksSession.builder.getOrCreate()
.
Note that you do not need to specify settings such as your workspace’s instance name, an access token, or your cluster’s ID and port number when you initialize the DatabricksSession
class. Databricks Connect gets this information from the configuration details that you already provided through the Databricks extension for Visual Studio Code earlier in this article.
For additional information about initializing the DatabricksSession
class, see the Databricks Connect code examples.
Step 4: Enable Databricks Connect
With the extension opened and the Workspace section configured for your code project, do the following:
In the Visual Studio Code status bar, click the red Databricks Connect disabled button.
If the Cluster section is not already configured in the extension, the following message appears: “Please attach a cluster to use Databricks Connect.” Click Attach Cluster and select a cluster that meets the Databricks Connect requirements.
If the Cluster section is configured but the cluster is not compatible with Databricks Connect, click the red Databricks Connect disabled button, click Attach Cluster, and select a compatible cluster.
If the Databricks Connect package is not already installed, the following message appears: “For interactive debugging and autocompletion you need Databricks Connect. Would you like to install it in the environment
<environment-name>
.” Click Install.In the Visual Studio Code status bar, the blue Databricks Connect enabled button appears.
If the red Databricks Connect disabled button still appears, click it, and complete the on-screen instructions to get the blue Databricks Connect enabled button to appear.
After the blue Databricks Connect enabled button appears, you are now ready to use Databricks Connect.
Note
You do not need to configure the extension’s Sync Destination section in order for your code project to use Databricks Connect.
Step 5: Run or debug your Python code
After you enable Databricks Connect for your code project, run or debug your Python file or notebook as follows.
To run or debug a Python (.py
) file:
In your code project, open the Python file that you want to run or debug.
Set any debugging breakpoints within the Python file.
In the file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, select Debug Python File. This choice supports step-through debugging, breakpoints, watch expressions, call stacks, and similar features. Other choices, which do not support debugging, include:
Run Python File to use Databricks Connect to run the file or notebook, but without debugging support.
Upload and Run File on Databricks to run the file on the cluster and display results within the IDE’s terminal. This choice does not use Databricks Connect to run the file.
Run File as Workflow on Databricks to run the file as an automated Databricks job within the workspace and display results within an editor in the IDE. This choice does not use Databricks Connect.

Note
The Run Current File in Interactive Window option, if available, attempts to run the file locally in a special Visual Studio Code interactive editor. Databricks does not recommend this option.
To run or debug a Python Jupyter notebook (.ipynb
):
In your code project, open the Python Jupyter notebook that you want to run or debug. Make sure the Python file is in Jupyter notebook format and has the extension
.ipynb
.Tip
You can create a new Python Jupyter notebook by running the >Create: New Jupyter Notebook command from within the Command Palette.
Click Run All Cells to run all cells without debugging, Execute Cell to run an individual corresponding cell without debugging, or Run by Line to run an individual cell line-by-line with limited debugging, with variable values displayed in the Jupyter panel (View > Open View > Jupyter).
For full debugging within an individual cell, set breakpoints, and then click Debug Cell in the menu next to the cell’s Run button.
After you click any of these options, you might be prompted to install missing Python Jupyter notebook package dependencies. Click to install.
For more information, see Jupyter Notebooks in VS Code.
Run a Python file on a cluster
With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:
In your code project, open the Python file that you want to run on the cluster.
Do one of the following:
In Explorer view (View > Explorer), right-click the file, and then select Upload and Run File on Databricks from the context menu.
In the file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, click Upload and Run File on Databricks.
The file runs on the cluster, and any output is printed to the Debug Console (View > Debug Console).
Run a Python file as a job
With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:
In your code project, open the Python file that you want to run as a job.
Do one of the following:
In Explorer view (View > Explorer), right-click the file, and then select Run File as Workflow on Databricks from the context menu.
In the file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, click Run File as Workflow on Databricks.
A new editor tab appears, titled Databricks Job Run. The file runs as a job in the workspace, and any output is printed to the new editor tab’s Output area.
To view information about the job run, click the Task run ID link in the new Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.
Run a Python notebook as a job
With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:
In your code project, open the Python notebook that you want to run as a job.
Tip
To create a Python notebook file in Visual Studio Code, begin by clicking File > New File, select Python File, and save the new file with a
.py
file extension.To turn the
.py
file into a Databricks notebook, add the special comment# Databricks notebook source
to the beginning of the file, and add the special comment# COMMAND ----------
before each cell. For more information, see Import a file and convert it to a notebook.Do one of the following:
In Explorer view (View > Explorer), right-click the notebook file, and then select Run File as Workflow on Databricks from the context menu.
In the notebook file editor’s title bar, click the drop-down arrow next to the play (Run or Debug) icon. Then in the drop-down list, click Run File as Workflow on Databricks.
A new editor tab appears, titled Databricks Job Run. The notebook runs as a job in the workspace, and the notebook and its output are displayed in the new editor tab’s Output area.
To view information about the job run, click the Task run ID link in the Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.
Run an R, Scala, or SQL notebook as a job
With the extension and your code project opened, and a Databricks configuration profile, cluster, and repo already set, do the following:
In your code project, open the R, Scala, or SQL notebook that you want to run as a job.
Tip
To create an R, Scala, or SQL notebook file in Visual Studio Code, begin by clicking File > New File, select Python File, and save the new file with a
.r
,.scala
, or.sql
file extension, respectively.To turn the
.r
,.scala
, or.sql
file into a Databricks notebook, add the special commentDatabricks notebook source
to the beginning of the file and add the special commentCOMMAND ----------
before each cell. Be sure to use the correct comment marker for each language (#
for R,//
for Scala, and--
for SQL). For more information, see Import a file and convert it to a notebook.This is similar to the pattern for Python notebooks:
In Run and Debug view (View > Run), select Run on Databricks as Workflow from the drop-down list, and then click the green play arrow (Start Debugging) icon.
Note
If Run on Databricks as Workflow is not available, see Create a custom run configuration.
A new editor tab appears, titled Databricks Job Run. The notebook runs as a job in the workspace. The notebook and its output are displayed in the new editor tab’s Output area.
To view information about the job run, click the Task run ID link in the Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.
Advanced tasks
You can use the Databricks extension for Visual Studio Code to perform the following advanced tasks.
Run tests with pytest
You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest
to test your functions that accept and return PySpark DataFrames in local memory. To get started with pytest
and run it locally, see Get Started in the pytest
documentation.
To run pytest
on code in a remote Databricks workspace, do the following in your Visual Studio Code project:
Step 1: Create the tests
Add a Python file with the following code, which contains your tests to run. This example assumes that this file is named spark_test.py
and is at the root of your Visual Studio Code project. This file contains a pytest
fixture, which makes the cluster’s SparkSession
(the entry point to Spark functionality on the cluster) available to the tests. This file contains a single test that checks whether the specified cell in the table contains the specified value. You can add your own tests to this file as needed.
from pyspark.sql import SparkSession
import pytest
@pytest.fixture
def spark() -> SparkSession:
# Create a SparkSession (the entry point to Spark functionality) on
# the cluster in the remote Databricks workspace. Unit tests do not
# have access to this SparkSession by default.
return SparkSession.builder.getOrCreate()
# Now add your unit tests.
# For example, here is a unit test that must be run on the
# cluster in the remote Databricks workspace.
# This example determines whether the specified cell in the
# specified table contains the specified value. For example,
# the third column in the first row should contain the word "Ideal":
#
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# |_c0 | carat | cut | color | clarity | depth | table | price | x | y | z |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# | 1 | 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3. 98 | 2.43 |
# +----+-------+-------+-------+---------+-------+-------+-------+------+-------+------+
# ...
#
def test_spark(spark):
spark.sql('USE default')
data = spark.sql('SELECT * FROM diamonds')
assert data.collect()[0][2] == 'Ideal'
Step 2: Create the pytest runner
Add a Python file with the following code, which instructs pytest
to run your tests from the previous step. This example assumes that the file is named pytest_databricks.py
and is at the root of your Visual Studio Code project.
import pytest
import os
import sys
# Run all tests in the connected repository in the remote Databricks workspace.
# By default, pytest searches through all files with filenames ending with
# "_test.py" for tests. Within each of these files, pytest runs each function
# with a function name beginning with "test_".
# Get the path to the repository for this file in the workspace.
repo_root = os.path.dirname(os.path.realpath(__file__))
# Switch to the repository's root directory.
os.chdir(repo_root)
# Skip writing .pyc files to the bytecode cache on the cluster.
sys.dont_write_bytecode = True
# Now run pytest from the repository's root directory, using the
# arguments that are supplied by your custom run configuration in
# your Visual Studio Code project. In this case, the custom run
# configuration JSON must contain these unique "program" and
# "args" objects:
#
# ...
# {
# ...
# "program": "${workspaceFolder}/path/to/this/file/in/workspace",
# "args": ["/path/to/_test.py-files"]
# }
# ...
#
retcode = pytest.main(sys.argv[1:])
Step 3: Create a custom run configuration
To instruct pytest
to run your tests, you must create a custom run configuration. Use the existing Databricks cluster-based run configuration to create your own custom run configuration, as follows:
On the main menu, click Run > Add configuration.
In the Command Palette, select Databricks.
Visual Studio Code adds a
.vscode/launch.json
file to your project, if this file does not already exist.Change the starter run configuration as follows, and then save the file:
Change this run configuration’s name from
Run on Databricks
to some unique display name for this configuration, in this exampleUnit Tests (on Databricks)
.Change
program
from${file}
to the path in the project that contains the test runner, in this example${workspaceFolder}/pytest_databricks.py
.Change
args
from[]
to the path in the project that contains the files with your tests, in this example["."]
.
Your
launch.json
file should look like this:{ // Use IntelliSense to learn about possible attributes. // Hover to view descriptions of existing attributes. // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387 "version": "0.2.0", "configurations": [ { "type": "databricks", "request": "launch", "name": "Unit Tests (on Databricks)", "program": "${workspaceFolder}/pytest_databricks.py", "args": ["."], "env": {} } ] }
Step 4: Run the tests
Make sure that pytest
is already installed on the cluster first. For example, with the cluster’s settings page open in your Databricks workspace, do the following:
On the Libraries tab, if pytest is visible, then
pytest
is already installed. If pytest is not visible, click Install new.For Library Source, click PyPI.
For Package, enter
pytest
.Click Install.
Wait until Status changes from Pending to Installed.
To run the tests, do the following from your Visual Studio Code project:
On the main menu, click View > Run.
In the Run and Debug list, click Unit Tests (on Databricks), if it is not already selected.
Click the green arrow (Start Debugging) icon.
The pytest
results display in the Debug Console (View > Debug Console on the main menu). For example, these results show that at least one test was found in the spark_test.py
file, and a dot (.
) means that a single test was found and passed. (A failing test would show an F
.)
<date>, <time> - Creating execution context on cluster <cluster-id> ...
<date>, <time> - Synchronizing code to /Repos/<someone@example.com>/<your-repository-name> ...
<date>, <time> - Running /pytest_databricks.py ...
============================= test session starts ==============================
platform linux -- Python <version>, pytest-<version>, pluggy-<version>
rootdir: /Workspace/Repos/<someone@example.com>/<your-repository-name>
collected 1 item
spark_test.py . [100%]
============================== 1 passed in 3.25s ===============================
<date>, <time> - Done (took 10818ms)
Use environment variable definitions files
Visual Studio Code supports environment variable definitions files for Python projects. This enables you to create a file with the extension .env
somewhere on your development machine, and Visual Studio Code will then apply the environment variables within this .env
file at run time. For more information, see Environment variable definitions file in the Visual Studio Code documentation.
To have the Databricks extension for Visual Studio Code use your .env
file, set databricks.python.envFile
within your settings.json file or Extensions > Databricks > Python: Env File within the Settings editor to the absolute path of your .env
file.
Important
If you set settings.json
, do not set python.envFile
to the absolute path of your .env
file as described in the Visual Studio Code documentation, as the Databricks extension for Visual Studio Code must override python.envFile
for its internal use. Be sure to only set databricks.python.envFile
instead.
Create a custom run configuration
You can create custom run configurations in Visual Studio Code to do things such as passing custom arguments to a job or a notebook, or creating different run settings for different files. For example, the following custom run configuration passes the --prod
argument to the job:
{
"version": "0.2.0",
"configurations": [
{
"type": "databricks-workflow",
"request": "launch",
"name": "Run on Databricks as Workflow",
"program": "${file}",
"parameters": {},
"args": ["--prod"],
"preLaunchTask": "databricks: sync"
}
]
}
To create a custom run configuration, click Run > Add Configuration from the main menu in Visual Studio Code. Then select either Databricks for a cluster-based run configuration or Databricks: Workflow for a job-based run configuration.
By using custom run configurations, you can also pass in command-line arguments and run your code just by pressing F5. For more information, see Launch configurations in the Visual Studio Code documentation.
Uninstall the extension
You can uninstall the Databricks extension for Visual Studio Code if needed, as follows:
In Visual Studio Code, click View > Extensions from the main menu.
In the list of extensions, select the Databricks for Visual Studio Code entry.
Click Uninstall.
Click Reload required, or restart Visual Studio Code.
Troubleshooting
Error when synchronizing through a proxy
Issue: When you try to run the Databricks extension for Visual Studio Code to synchronize your local code project through a proxy, an error message similar to the following appears, and the synchronization operation is unsuccessful: Get "https://<workspace-instance>/api/2.0/preview/scim/v2/Me": EOF
.
Possible cause: Visual Studio Code does not know how to find the proxy.
Recommended solution: Restart Visual Studio Code from your terminal by running the following command, and then try synchronizing again:
env HTTPS_PROXY=<proxy-url>:<port> code
In the preceding command:
Replace
<proxy-url>
with the full URL to your proxy.Replace
<port>
with the correct port on your proxy.
Error: “spawn unknown system error -86” when you try to synchronize local code
Issue: When you try to synchronize local code in a project to a remote Databricks workspace, the Terminal shows that synchronization has started but displays only the error message spawn unknown system error -86
. Also, the Sync Destination section of the Configuration pane remains in a pending state.
Possible cause: The wrong version of the Databricks extension for Visual Studio Code is installed for your development machine’s operating system.
Recommend solution: Uninstall the extension, and then Install and open the extension for your development machine’s operating system from the beginning.
Send usage logs to Databricks
If you have issues synchronizing local code to a remote Databricks workspace, you can send usage logs and related information to Databricks Support by doing the following:
Turn on verbose mode for the Databricks command-line interface (CLI) by checking the Bricks: Verbose Mode setting, or setting
databricks.bricks.verboseMode
totrue
, as described in Settings.Also turn on logging by checking the Logs: Enabled setting, or setting
databricks.logs.enabled
totrue
, as described in Settings. Be sure to restart Visual Studio Code after you turn on logging.Attempt to reproduce your issue.
From the Command Palette (View > Command Palette from the main menu), run the Databricks: Open full logs command.
Send the
bricks-logs.json
andsdk-and-extension-logs.json
files that appear to Databricks Support.Also copy the contents of the Terminal (View > Terminal) in the context of the issue, and send this content to Databricks Support.
To send error logs that are not about code synchronization issues to Databricks Support:
From the Command Palette (View > Command Palette), run the Databricks: Open full logs command.
Send only the
sdk-and-extension-logs.json
file that appears to Databricks Support.
The Output view (View > Output, Databricks Logs) shows truncated information if Logs: Enabled is checked or databricks.logs.enabled
is set to true
. To show more information, change the following settings, as described in Settings:
Logs: Max Array Length or
databricks.logs.maxArrayLength
Logs: Max Field Length or
databricks.logs.maxFieldLength
Logs: Truncation Depth or
databricks.logs.truncationDepth
Command Palette
The Databricks extension for Visual Studio Code adds the following commands to the Visual Studio Code Command Palette. See also Command Palette in the Visual Studio Code documentation.
Command |
Description |
---|---|
|
Enables IntelliSense in the Visual Studio Code code editor for PySpark, Databricks Utilities, and related globals such as |
|
Moves focus to the Command Palette to create, select, or change the Databricks cluster to use for the current project. See Set the cluster. |
|
Moves focus to the Command Palette to create, select, or change the repository in Databricks Repos to use for the current project. See Set the repository. |
|
Moves focus to the Command Palette to create, select, or change Databricks authentication details to use for the current project. See Set up authentication with a configuration profile. |
|
Creates a new sync destination. |
|
Removes the reference to the Databricks cluster from the current project. |
|
Removes the reference to the repository in Databricks Repo from the current project. |
|
Moves focus in the Databricks view to the Clusters pane. |
|
Moves focus in the Databricks view to the Configuration pane. |
|
Moves focus in the Databricks view to the Workspace Explorer pane. |
|
Resets the Databricks view to show the Configure Databricks and Show Quickstart buttons in the Configuration pane.
Any content in the current project’s |
|
Opens the Databricks configuration profiles file, from the default location, for the current project. See Set up authentication with a configuration profile. |
|
Opens the folder that contains the application log files that the Databricks extension for Visual Studio Code writes to your development machine. |
|
Refreshes the Workspace Explorer pane in the Databricks view. |
|
Runs a Python file on the cluster. |
|
Shows the Quickstart file in the editor. |
|
Starts the cluster if it is already stopped. |
|
Starts synchronizing the current project’s code to the Databricks workspace. This command performs an incremental synchronization. |
|
Starts synchronizing the current project’s code to the Databricks workspace. This command performs a full synchronization, even if an incremental sync is possible. |
|
Stops the cluster if it is already running. |
|
Stops synchronizing the current project’s code to the Databricks workspace. |
|
Runs a Python file or a notebook as an automated Databricks job within the workspace. |
Settings
The Databricks extension for Visual Studio Code adds the following settings to Visual Studio Code. See also Settings editor and settings.json in the Visual Studio Code documentation.
Settings editor (Extensions > Databricks) |
settings.json |
Description |
---|---|---|
Bricks: Verbose Mode |
|
Checked or set to Checked or set to |
Clusters: Only Show Accessible Clusters |
|
Checked or set to |
Logs: Enabled |
|
Checked or set to |
Logs: Max Array Length |
|
The maximum number of items to show for array fields. The default is |
Logs: Max Field Length |
|
The maximum length of each field displayed in the logs output panel. The default is |
Logs: Truncation Depth |
|
The maximum depth of logs to show without truncation. The default is |
Override Databricks Config File |
|
An alternate location for the |
Python: Env File |
|
The absolute path to your custom Python environment variable definitions ( |
Sync: Destination Type |
|
Whether to use a folder in the workspace ( Setting this to Reload your window for any change to take effect. |
Frequently asked questions (FAQs)
Do you have support for, or a timeline for support for, any of the following capabilities?
Other languages, such as Scala or SQL
Delta Live Tables
Databricks SQL warehouses
Other IDEs, such as PyCharm
Additional libraries
Full CI/CD integration
Authentication schemes in addition to Databricks personal access
Databricks is aware of these requests and is prioritizing work to enable simple scenarios for local development and remote running of code. Please forward additional requests and scenarios to your Databricks representative. Databricks will incorporate your input into future planning.
How does the Databricks Terraform provider relate to the Databricks extension for Visual Studio Code?
Databricks continues to recommend the Databricks Terraform provider for managing your CI/CD pipelines in a predictable way. Please let your Databricks representative know how you might use an IDE to manage your deployments in the future. Databricks will incorporate your input into future planning.
How does dbx by Databricks Labs relate to the Databricks extension for Visual Studio Code?
The main features of dbx by Databricks Labs include:
Project scaffolding.
Limited local development through the
dbx execute
command.CI/CD for Databricks jobs.
The Databricks extension for Visual Studio Code enables local development and remotely running Python code files on Databricks clusters, and remotely running Python code files and notebooks in Databricks jobs. dbx
can continue to be used for project scaffolding and CI/CD for Databricks jobs.
What happens if I already have an existing Databricks configuration profile that I created through the Databricks CLI?
You can select your existing configuration profile when you configure the Databricks extension for Visual Studio Code. With the extension and your code project opened, do the following:
In the Configuration pane, click the gear (Configure workspace) icon.
Enter your workspace instance URL, for example
https://1234567890123456.7.gcp.databricks.com
.In the Command Palette, select your existing configuration profile.
Which permissions do I need for a Databricks workspace to use the Databricks extension for Visual Studio Code?
You must have execute permissions for a Databricks cluster for running code, as well as permissions to create a repository in Databricks Repos.
Which settings must be enabled for a Databricks workspace to use the Databricks extension for Visual Studio Code?
The workspace must have the Files in Repos setting turned on. For instructions, see Configure support for Files in Repos. If you cannot turn on this setting yourself, contact your Databricks workspace administrator.
Can I use the Databricks extension for Visual Studio Code with a proxy?
Yes. See the recommended solution in Error when synchronizing through a proxy.
Can I use the Databricks extension for Visual Studio Code with an existing repository stored with a remote Git provider?
No. The Databricks extension for Visual Studio Code works only with repositories that it creates.