Develop on Databricks

Databricks developer users encompass the data scientists, data engineers, data analysts, machine learning engineers, as well as DevOps and MLOps engineers - all building solutions and integrations to extend and customize Databricks for their specific needs. In addition to the many Databricks APIs and data engineering features available in the workspace, there are also many tools for connecting to Databricks and developing locally that support developer users of Databricks.

This article provides an overview of APIs and tools available for Databricks developer users.

Start coding in the workspace

Developing in the workspace is a great way to quickly get familiar with Databricks APIs. Databricks supports Python, SQL, Scala, R, and other developer-focused features in the workspace, including helpful tools and utilities.

Here are some ways to start:

Build custom apps and solutions

Databricks provides tools for both workspace and local development. In the workspace you can create apps using the UI, data is easily accessible in Unity Catalog volumes and workspace files, workspace-only features such as the Databricks Assistant for debugging are available, other functionality such as notebooks are fully-featured, and source control is available with Git folders.

Alternatively, develop custom solutions using an IDE on your local machine to take advantage of the full functionality of a rich development environment. Local development supports a wider range of languages, which means language-dependent features such as debugging and test frameworks are available to support larger projects, along with direct access to source control.

For tool usage recommendations, see Which developer tool should I use?.

Feature

Description

Authenticate and authorize

Configure authentication and authorization for your tools, scripts, and apps to work with Databricks.

Databricks extension for Visual Studio Code

Connect to your remote Databricks workspaces from Visual Studio Code for casy configuration of your connection to the Databricks workspace, and a UI for managing Databricks resources.

PyCharm Databricks plugin

Configure a connection to a remote Databricks workspace and run files on Databricks clusters from PyCharm. This plugin is developed and provided by JetBrains in partnership with Databricks.

Databricks SDKs

Automate your interactions with Databricks using an SDK, instead of calling the REST APIs directly.

Connect to Databricks

Connecting to Databricks is a necessary component of many integrations and solutions, and Databricks provides a large selection of connection tools from which to choose. The following table provides tools to connect your development environment and processes to your Databricks workspace and resources.

Feature

Description

Databricks Connect

Connect to Databricks using popular integrated development environments (IDEs) such as PyCharm, IntelliJ IDEA, Eclipse, RStudio, and JupyterLab.

Databricks extension for Visual Studio Code

Easy configuration of your connection to the Databricks workspace, and a UI for managing Databricks resources.

SQL drivers and tools

Connect to Databricks to run SQL commands and scripts, interact programmatically with Databricks, and integrate Databricks SQL functionality into applications written in popular languages such as Python, Go, JavaScript and TypeScript.

Tip

You can also connect many additional popular third-party tools to clusters and SQL warehouses to access data in Databricks. See the Technology partners.

Manage infrastructure and resources

Developers and data engineers building CI/CD pipelines to automate the provisioning and managing of infrastructure and resources can choose from the following tools that support simple as well as more complicated pipeline scenarios.

For tool usage recommendations, see Which developer tool should I use?.

Feature

Description

Databricks CLI

Access Databricks functionality using the Databricks command-line interface (CLI). The CLI wraps the Databricks REST API, so instead of sending REST API calls directly using curl or Postman, you can use the Databricks CLI to interact with Databricks. Use the CLI from a local terminal or use it from the workspace web terminal.

Databricks Asset Bundles

Define and manage Databricks resources and your CI/CD pipeline using industry-standard development, testing, and deployment best practices for your data and AI projects with Databricks Asset Bundles, which is a feature of the Databricks CLI.

Databricks Terraform provider and Terraform CDKTF for Databricks

Provision Databricks infrastructure and resources using Terraform.

CI/CD tools

Integrate popular CI/CD systems and frameworks such as GitHub Actions, Jenkins, and Apache Airflow.

Collaborate and share code

Among many other collaboration features in the workspace, Databricks specifically supports developer users that want to collaborate and share code in the workspace with these features:

Feature

Description

UDFs

Develop UDFs (user-defined functions) to reuse and share code.

Git folders

Configure Git folders to version and source control contributions to your Databricks project files.

Engage with the Databricks developer community

Databricks has an active developer community, which is supported by the following programs and resources: