Call the Databricks REST API with Python
Important
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See the Databricks SDK for Python.
Note
Databricks recommends that you use the Databricks SDK for Python instead of the approach described in this article. Specifically:
The legacy Databricks CLI package that is described in this article has incomplete coverage of the Databricks REST API (only about 15%).
The legacy Databricks CLI package does not support all Databricks authentication mechanisms.
Databricks plans no new feature work for the legacy Databricks CLI package at this time.
The legacy Databricks CLI package is not supported through Databricks Support channels.
This article relies on the legacy Databricks CLI versions 0.99 and lower, which are in an Experimental state.
You can call the legacy Databricks REST API to automate Databricks with Python code, instead of using non-Python command-line tools such as curl
or API clients such as Postman. To call the Databricks REST API with Python, you can use the legacy Databricks CLI package as a library. This library is written in Python and enables you to call the Databricks REST API through Python classes that closely model the Databricks REST API request and response payloads.
Note
Direct use of the Python requests library is another approach. However, you would need to work at a lower level, manually providing the necessary headers, handling errors, and other related low-level coding tasks.
The legacy Databricks CLI supports calling the following Databricks REST APIs:
Cluster Policies API 2.0
Clusters API 2.0
DBFS API 2.0
Groups API 2.0
Instance Pools API 2.0
Jobs API 2.1, 2.0
Libraries API 2.0
Delta Live Tables API 2.0
Repos API 2.0
Secrets API 2.0
Token API 2.0
Unity Catalog API 2.1
Workspace API 2.0
The legacy Databricks CLI does not support calling the following Databricks REST APIs:
Account API 2.0
Databricks SQL Queries, Dashboards, and Alerts API 2.0
Databricks SQL Query History API 2.0
Databricks SQL Warehouses API 2.0
Git Credentials API 2.0
Global Init Scripts API 2.0
IP Access List API 2.0
MLflow API 2.0
Permissions API 2.0
SCIM API 2.0
Token Management API 2.0
API 1.2
For detailed information, see the Databricks REST API Reference.
Requirements
Python version 3.6 or above. To check whether Python is installed, and if so to check the installed version, run
python --version
from your terminal of PowerShell. Install Python, if it is not already installed.python --version
Note
Some installations of Python require
python3
instead ofpython
. If so, replacepython
withpython3
throughout this article.The legacy Databricks CLI version 0.99 or lower. To check whether the legacy Databricks CLI is installed, and if so to check the installed version, run
databricks --version
. To install the legacy Databricks CLI, runpip install databricks-cli
orpython -m pip install databricks-cli
.# Check whether the legacy Databricks CLI is installed, and if so check the installed version. # The legacy Databricks CLI must be version 0.99 or lower. databricks --version # Install the legacy Databricks CLI. pip install databricks-cli # Or... python -m pip install databricks-cli
Note
Some installations of
pip
requirepip3
instead ofpip
. If so, replacepip
withpip3
throughout this article.Your workspace instance URL, for example
https://1234567890123456.7.gcp.databricks.com
A Databricks personal access token for your Databricks workspace. To create a Databricks personal access token, see Databricks personal access tokens; see also Manage personal access tokens.
Step 1: Set up authentication
To authenticate with the Databricks REST API through the legacy Databricks CLI package library, your Python code requires two pieces of information at minimum:
Your workspace instance URL, for example
https://1234567890123456.7.gcp.databricks.com
.A Databricks personal access token for your Databricks workspace. To create a Databricks personal access token, see Databricks personal access tokens; see also Manage personal access tokens.
For code modularity, portability, and security, you should not hard-code this information into your Python code. Instead, you should retrieve this information from a secure location at run time. For example, the code in this article uses the following environment variables:
DATABRICKS_HOST
, which represents your workspace instance URL, for examplehttps://1234567890123456.7.gcp.databricks.com
.DATABRICKS_TOKEN
. A Databricks personal access token for your Databricks workspace. To create a Databricks personal access token, see Databricks personal access tokens; see also Manage personal access tokens.
You can set these environment variables as follows:
To set the environment variables for only the current terminal session, run the following commands. To set the environment variables for all terminal sessions, enter the following commands into your shell’s startup file and then restart your terminal. Replace the example values here with your own values.
export DATABRICKS_HOST="https://1234567890123456.7.gcp.databricks.com"
export DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"
To set the environment variables for only the current PowerShell session, run the following commands. Replace the example values here with your own values.
set DATABRICKS_HOST="https://1234567890123456.7.gcp.databricks.com"
set DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"
To set the environment variables for all Command Prompt sessions, run the following commands and then restart your Command Prompt. Replace the example values here with your own values.
setx DATABRICKS_HOST="https://1234567890123456.7.gcp.databricks.com"
setx DATABRICKS_TOKEN "dapi1234567890b2cd34ef5a67bc8de90fa12b"
Step 2: Write your code
In your Python code file, import the os library to enable your code to get the environment variable values.
import os
Import the
ApiClient
class from the databricks_cli.sdk.api_client module to enable your code to authenticate with the Databricks REST API.from databricks_cli.sdk.api_client import ApiClient
Import additional classes as needed to enable your code to call the Databricks REST API after authenticating, as follows.
REST API
import
statementsCluster Policies API 2.0
from databricks_cli.cluster_policies.api import ClusterPolicyApi
Clusters API 2.0
from databricks_cli.clusters.api import ClusterApi
DBFS API 2.0
from databricks_cli.dbfs.api import DbfsApi
from databricks_cli.dbfs.dbfs_path import DbfsPath
Groups API 2.0
from databricks_cli.groups.api import GroupsApi
Instance Pools API 2.0
from databricks_cli.instance_pools.api import InstancePoolsApi
Jobs API 2.1
from databricks_cli.jobs.api import JobsApi
( 1 )from databricks_cli.runs.api import RunsApi
( 2 )Libraries API 2.0
from databricks_cli.libraries.api import LibrariesApi
Delta Live Tables API 2.0
from databricks_cli.pipelines.api import PipelinesApi, LibraryObject
Repos API 2.0
from databricks_cli.repos.api import ReposApi
Secrets API 2.0
from databricks_cli.secrets.api import SecretApi
Token API 2.0
from databricks_cli.tokens.api import TokensApi
Unity Catalog API 2.1
from databricks_cli.unity_catalog.api import UnityCatalogApi
Workspace API 2.0
from databricks_cli.workspace.api import WorkspaceApi
( 1 ) Required only for working with jobs.
( 2 ) Required only for working with job runs.
For example, to call the Clusters API 2.0, add the following code:
from databricks_cli.clusters.api import ClusterApi
Use the
ApiClient
class to authenticate with the Databricks REST API. Use theos
library’sgetenv
function to get the workspace instance URL, for examplehttps://1234567890123456.7.gcp.databricks.com
and token values. The following example uses the variable name ofapi_client
to represent an instance of theApiClient
class.api_client = ApiClient( host = os.getenv('DATABRICKS_HOST'), token = os.getenv('DATABRICKS_TOKEN') )
Initialize instances of the classes as needed to call the Databricks REST API after authenticating, for example:
REST API
Suggested class initialization statements
Cluster Policies API 2.0
cluster_policies_api = ClusterPolicyApi(api_client)
Clusters API 2.0
clusters_api = ClusterApi(api_client)
DBFS API 2.0
dbfs_api = DbfsApi(api_client)
Groups API 2.0
groups_api = GroupsApi(api_client)
Instance Pools API 2.0
instance_pools_api = InstancePoolsApi(api_client)
Jobs API 2.1
jobs_api = JobsApi(api_client)
( 1 )runs_api = RunsApi(api_client)
( 2 )Libraries API 2.0
libraries_api = LibrariesApi(api_client)
Delta Live Tables API 2.0
pipelines_api = PipelinesApi(api_client)
Repos API 2.0
repos_api = ReposApi(api_client)
Secrets API 2.0
secrets_api = SecretApi(api_client)
Token API 2.0
tokens_api = TokensApi(api_client)
Unity Catalog API 2.1
unity_catalog_api = UnityCatalogApi(api_client)
Workspace API 2.0
workspace_api = WorkspaceApi(api_client)
( 1 ) Required only for working with jobs.
( 2 ) Required only for working with job runs.
For example, to initialize an instance of the Clusters API 2.0, add the following code:
clusters_api = ClusterApi(api_client)
Call the class method as needed that corresponds to the Databricks REST API operation. To find the calling signature and usage notes for the method, see the documentation for the following modules in the legacy Databricks CLI source code.
REST API
Module documentation
Cluster Policies API 2.0
Clusters API 2.0
DBFS API 2.0
Groups API 2.0
Instance Pools API 2.0
Jobs API 2.1
databricks_cli.jobs.api ( 1 )
databricks_cli.runs.api ( 2 )
Libraries API 2.0
Delta Live Tables API 2.0
Repos API 2.0
Secrets API 2.0
Token API 2.0
Unity Catalog API 2.1
Workspace API 2.0
( 1 ) Required only for working with jobs.
( 2 ) Required only for working with job runs.
For example, to use the Clusters API 2.0 to list available cluster names and their IDs in the workspace, add the following code:
clusters_list = clusters_api.list_clusters() print("Cluster name, cluster ID") for cluster in clusters_list['clusters']: print(f"{cluster['cluster_name']}, {cluster['cluster_id']}")
The full code for the preceding instructions is as follows:
import os
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.clusters.api import ClusterApi
api_client = ApiClient(
host = os.getenv('DATABRICKS_HOST'),
token = os.getenv('DATABRICKS_TOKEN')
)
clusters_api = ClusterApi(api_client)
clusters_list = cluster_api.list_clusters()
print("Cluster name, cluster ID")
for cluster in clusters_list['clusters']:
print(f"{cluster['cluster_name']}, {cluster['cluster_id']}")
Examples
The following examples show how to use the source code in the legacy Databricks CLI and Python to automate the Databricks REST API for some basic usage scenarios.
Download a file from a DBFS path
import os
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.dbfs.api import DbfsApi
from databricks_cli.dbfs.dbfs_path import DbfsPath
api_client = ApiClient(
host = os.getenv('DATABRICKS_HOST'),
token = os.getenv('DATABRICKS_TOKEN')
)
dbfs_source_file_path = 'dbfs:/tmp/users/someone@example.com//hello-world.txt'
local_file_download_path = './hello-world.txt'
dbfs_api = DbfsApi(api_client)
dbfs_path = DbfsPath(dbfs_source_file_path)
# Download the workspace file locally.
dbfs_api.get_file(
dbfs_path,
local_file_download_path,
overwrite = True
)
# Print the downloaded file's contents.
print(open(local_file_download_path, 'r').read())
Get information about a Delta Live Tables pipeline
import os
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.pipelines.api import PipelinesApi
api_client = ApiClient(
host = os.getenv('DATABRICKS_HOST'),
token = os.getenv('DATABRICKS_TOKEN')
)
pipelines_api = PipelinesApi(api_client)
pipelines_get = pipelines_api.get('1234a56b-c789-0123-d456-78901234e5f6')
print(f"Name: {pipelines_get['name']}\n" \
f"ID: {pipelines_get['pipeline_id']}\n" \
f"State: {pipelines_get['state']}\n" \
f"Creator: {pipelines_get['creator_user_name']}"
)
Create a schema in Unity Catalog
import os
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.unity_catalog.api import UnityCatalogApi
api_client = ApiClient(
host = os.getenv('DATABRICKS_HOST'),
token = os.getenv('DATABRICKS_TOKEN')
)
unity_catalog_api = UnityCatalogApi(api_client)
catalog = "main"
schema = "my_schema"
# Create the schema (also known as a database) in the specified catalog.
unity_catalog_create_schema = unity_catalog_api.create_schema(
catalog_name = catalog,
schema_name = schema,
comment = "This is my schema"
)
print(f"Schema: {unity_catalog_create_schema['name']}\n" \
f"Owner: {unity_catalog_create_schema['owner']}\n" \
f"Metastore ID: {unity_catalog_create_schema['metastore_id']}"
)
# Delete the schema.
unity_catalog_api.delete_schema(f"{catalog}.{schema}")
List objects in a workspace path
import os
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.workspace.api import WorkspaceApi, WorkspaceFileInfo
api_client = ApiClient(
host = os.getenv('DATABRICKS_HOST'),
token = os.getenv('DATABRICKS_TOKEN')
)
workspace_api = WorkspaceApi(api_client)
workspace_list_objects = workspace_api.list_objects('/Users/someone@example.com/')
for object in workspace_list_objects:
print(
object.to_row(
is_long_form = True,
is_absolute = True
)
)