Developer tools and guidance
Learn about tools and guidance you can use to work with Databricks assets and data and to develop Databricks applications.
Use an IDE
You can connect many popular third-party IDEs to a Databricks cluster. This allows you to write code on your local development machine by using the Spark APIs and then run that code as jobs remotely on a Databricks cluster.
Databricks recommends that you use dbx by Databricks Labs for local development.
Databricks also provides a code sample that you can explore to use an IDE with dbx
.
Note
Databricks also supports a tool named Databricks Connect. However, Databricks plans no new feature development for Databricks Connect at this time. Also, Databricks Connect has several limitations.
Use a connector or driver
You can use connectors and drivers to connect your code to a Databricks cluster or a Databricks SQL warehouse. These connectors and drivers include:
For additional information about connecting your code through JDBC or ODBC, see the JDBC and ODBC configuration guidance.
Use a notebook
To run Python, R, or Scala code in a notebook to work with file systems, libraries, and secrets from a Databricks cluster, see Databricks Utilities.
Call Databricks REST APIs
You can use popular third-party utilities such as curl and tools such as Postman to work with Databricks resources directly through the Databricks REST APIs.
Category |
Use this API to work with… |
---|---|
Data Science & Engineering workspace assets such as clusters, global init scripts, groups, pools, jobs, libraries, permissions, secrets, and tokens, by using the latest version of the Databricks REST API. |
|
Data Science & Engineering workspace assets such as jobs, by using version 2.1 of the Databricks REST API. |
|
Data Science & Engineering workspace assets such as clusters, global init scripts, groups, pools, jobs, libraries, permissions, secrets, and tokens, by using version 2.0 of the Databricks REST API. |
|
Command executions and execution contexts by using version 1.2 of the Databricks REST API. |
Provision infrastructure
You can use an infrastructure-as-code (IaC) approach to programmatically provision Databricks infrastructure and assets such as workspaces, clusters, jobs, groups, and users. For details, see:
Use CI/CD
To manage the lifecycle of Databricks assets and data, you can use continuous integration and continuous delivery (CI/CD) and data pipeline tools.
Area |
Use these tools when you want to… |
---|---|
Continuous integration and delivery on Databricks using GitHub Actions |
Build a CI/CD workflow on GitHub that uses GitHub Actions developed for Databricks. |
Continuous integration and delivery on Databricks using Jenkins |
Develop a CI/CD pipeline for Databricks that uses Jenkins. |
Manage and schedule a data pipeline that uses Apache Airflow. |
Use a SQL database tool
You can use these tools to run SQL commands and scripts and to browse database objects in Databricks.
Tool |
Use this when you want to: |
---|---|
Use a command line to run SQL commands and scripts on a Databricks SQL warehouse. |
|
Use a query console, schema navigation, smart code completion, and other features to run SQL commands and scripts and to browse database objects in Databricks. |
|
Run SQL commands and browse database objects in Databricks by using this client software application and database administration tool. |
|
Run SQL scripts (either interactively or as a batch) in Databricks by using this SQL query tool. |
Use other tools
You can connect many popular third-party tools to clusters to access data in Databricks. See the Databricks integrations.