Develop on Databricks
Databricks developer users encompass the data scientists, data engineers, data analysts, machine learning engineers, as well as DevOps and MLOps engineers - all building solutions and integrations to extend and customize Databricks for their specific needs. In addition to the many Databricks APIs and data engineering features available in the workspace, there are also many tools for connecting to Databricks and developing locally that support developer users of Databricks.
This article provides an overview of APIs and tools available for Databricks developer users.
Start coding in the workspace
Developing in the workspace is a great way to quickly get familiar with Databricks APIs. Databricks supports Python, SQL, Scala, R, and other developer-focused features in the workspace, including helpful tools and utilities.
Here are some ways to start:
Read an overview and find links to tutorials for various scenarios for Python, Scala, and R. For a table of tools supported in various languages, see Languages overview.
Browse the SQL language reference for a look at the depth and breadth of capabilities.
Work through the Tutorial: Load and transform data using Apache Spark DataFrames in Python, Scala, or R to get an introduction to Spark APIs. Additional simple examples for PySpark are in PySpark basics.
Browse available reference documentation, including the REST API reference which provides a good picture of Databricks objects that can also be created and modified with other tools.
Install the Python SDK in a notebook and write a simple function.
Move some files around using the Databricks Utilities
fs
commands, to get familiar with using thedbutils
utilities to manipulate the Databricks environment.
Build custom apps and solutions
Databricks provides tools for both workspace and local development. In the workspace you can create apps using the UI, data is easily accessible in Unity Catalog volumes and workspace files, workspace-only features such as the Databricks Assistant for debugging are available, other functionality such as notebooks are fully-featured, and source control is available with Git folders.
Alternatively, develop custom solutions using an IDE on your local machine to take advantage of the full functionality of a rich development environment. Local development supports a wider range of languages, which means language-dependent features such as debugging and test frameworks are available to support larger projects, along with direct access to source control.
For tool usage recommendations, see Which developer tool should I use?.
Feature |
Description |
---|---|
Configure authentication and authorization for your tools, scripts, and apps to work with Databricks. |
|
Connect to your remote Databricks workspaces from Visual Studio Code for casy configuration of your connection to the Databricks workspace, and a UI for managing Databricks resources. |
|
Configure a connection to a remote Databricks workspace and run files on Databricks clusters from PyCharm. This plugin is developed and provided by JetBrains in partnership with Databricks. |
|
Automate your interactions with Databricks using an SDK, instead of calling the REST APIs directly. |
Connect to Databricks
Connecting to Databricks is a necessary component of many integrations and solutions, and Databricks provides a large selection of connection tools from which to choose. The following table provides tools to connect your development environment and processes to your Databricks workspace and resources.
Feature |
Description |
---|---|
Connect to Databricks using popular integrated development environments (IDEs) such as PyCharm, IntelliJ IDEA, Eclipse, RStudio, and JupyterLab. |
|
Easy configuration of your connection to the Databricks workspace, and a UI for managing Databricks resources. |
|
Connect to Databricks to run SQL commands and scripts, interact programmatically with Databricks, and integrate Databricks SQL functionality into applications written in popular languages such as Python, Go, JavaScript and TypeScript. |
Tip
You can also connect many additional popular third-party tools to clusters and SQL warehouses to access data in Databricks. See the Technology partners.
Manage infrastructure and resources
Developers and data engineers building CI/CD pipelines to automate the provisioning and managing of infrastructure and resources can choose from the following tools that support simple as well as more complicated pipeline scenarios.
For tool usage recommendations, see Which developer tool should I use?.
Feature |
Description |
---|---|
Access Databricks functionality using the Databricks command-line interface (CLI). The CLI wraps the Databricks REST API, so instead of sending REST API calls directly using curl or Postman, you can use the Databricks CLI to interact with Databricks. Use the CLI from a local terminal or use it from the workspace web terminal. |
|
Define and manage Databricks resources and your CI/CD pipeline using industry-standard development, testing, and deployment best practices for your data and AI projects with Databricks Asset Bundles, which is a feature of the Databricks CLI. |
|
Databricks Terraform provider and Terraform CDKTF for Databricks |
Provision Databricks infrastructure and resources using Terraform. |
Integrate popular CI/CD systems and frameworks such as GitHub Actions, Jenkins, and Apache Airflow. |
Engage with the Databricks developer community
Databricks has an active developer community, which is supported by the following programs and resources:
Databricks MVPs: This program recognizes community members, data scientists, data engineers, developers, and open source enthusiasts who go above and beyond in the data and AI community. For more information, see Databricks MVPs.
Training: Databricks provides learning modules for Apache Spark developers, Generative AI engineers, Data engineers, and more.
Community: A wealth of knowledge is available from the Databricks community and the Apache Spark community.