Call the Databricks REST API with Python

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See the Databricks SDK for Python.

Note

Databricks recommends that you use the Databricks SDK for Python instead of the approach described in this article. Specifically:

  • The legacy Databricks CLI package that is described in this article has incomplete coverage of the Databricks REST API (only about 15%).

  • The legacy Databricks CLI package does not support all Databricks authentication mechanisms.

  • Databricks plans no new feature work for the legacy Databricks CLI package at this time.

  • The legacy Databricks CLI package is not supported through Databricks Support channels.

This article relies on the legacy Databricks CLI versions 0.99 and lower, which are in an Experimental state.

You can call the legacy Databricks REST API to automate Databricks with Python code, instead of using non-Python command-line tools such as curl or API clients such as Postman. To call the Databricks REST API with Python, you can use the legacy Databricks CLI package as a library. This library is written in Python and enables you to call the Databricks REST API through Python classes that closely model the Databricks REST API request and response payloads.

Note

Direct use of the Python requests library is another approach. However, you would need to work at a lower level, manually providing the necessary headers, handling errors, and other related low-level coding tasks.

The legacy Databricks CLI supports calling the following Databricks REST APIs:

  • Cluster Policies API 2.0

  • Clusters API 2.0

  • DBFS API 2.0

  • Groups API 2.0

  • Instance Pools API 2.0

  • Jobs API 2.1, 2.0

  • Libraries API 2.0

  • Delta Live Tables API 2.0

  • Repos API 2.0

  • Secrets API 2.0

  • Token API 2.0

  • Unity Catalog API 2.1

  • Workspace API 2.0

The legacy Databricks CLI does not support calling the following Databricks REST APIs:

  • Account API 2.0

  • Databricks SQL Queries, Dashboards, and Alerts API 2.0

  • Databricks SQL Query History API 2.0

  • Databricks SQL Warehouses API 2.0

  • Git Credentials API 2.0

  • Global Init Scripts API 2.0

  • IP Access List API 2.0

  • MLflow API 2.0

  • Permissions API 2.0

  • SCIM API 2.0

  • Token Management API 2.0

  • API 1.2

For detailed information, see the Databricks REST API Reference.

Requirements

  • Python version 3.6 or above. To check whether Python is installed, and if so to check the installed version, run python --version from your terminal of PowerShell. Install Python, if it is not already installed.

    python --version
    

    Note

    Some installations of Python require python3 instead of python. If so, replace python with python3 throughout this article.

  • The legacy Databricks CLI version 0.99 or lower. To check whether the legacy Databricks CLI is installed, and if so to check the installed version, run databricks --version. To install the legacy Databricks CLI, run pip install databricks-cli or python -m pip install databricks-cli.

    # Check whether the legacy Databricks CLI is installed, and if so check the installed version.
    # The legacy Databricks CLI must be version 0.99 or lower.
    databricks --version
    
    # Install the legacy Databricks CLI.
    pip install databricks-cli
    
    # Or...
    
    python -m pip install databricks-cli
    

    Note

    Some installations of pip require pip3 instead of pip. If so, replace pip with pip3 throughout this article.

  • Your workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com

  • A Databricks personal access token for your Databricks workspace. To create a Databricks personal access token, see Databricks personal access tokens; see also Manage personal access tokens.

Step 1: Set up authentication

To authenticate with the Databricks REST API through the legacy Databricks CLI package library, your Python code requires two pieces of information at minimum:

For code modularity, portability, and security, you should not hard-code this information into your Python code. Instead, you should retrieve this information from a secure location at run time. For example, the code in this article uses the following environment variables:

You can set these environment variables as follows:

To set the environment variables for only the current terminal session, run the following commands. To set the environment variables for all terminal sessions, enter the following commands into your shell’s startup file and then restart your terminal. Replace the example values here with your own values.

export DATABRICKS_HOST="https://1234567890123456.7.gcp.databricks.com"
export DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"

To set the environment variables for only the current PowerShell session, run the following commands. Replace the example values here with your own values.

set DATABRICKS_HOST="https://1234567890123456.7.gcp.databricks.com"
set DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"

To set the environment variables for all Command Prompt sessions, run the following commands and then restart your Command Prompt. Replace the example values here with your own values.

setx DATABRICKS_HOST="https://1234567890123456.7.gcp.databricks.com"
setx DATABRICKS_TOKEN "dapi1234567890b2cd34ef5a67bc8de90fa12b"

Step 2: Write your code

  1. In your Python code file, import the os library to enable your code to get the environment variable values.

    import os
    
  2. Import the ApiClient class from the databricks_cli.sdk.api_client module to enable your code to authenticate with the Databricks REST API.

    from databricks_cli.sdk.api_client import ApiClient
    
  3. Import additional classes as needed to enable your code to call the Databricks REST API after authenticating, as follows.

    REST API

    import statements

    Cluster Policies API 2.0

    from databricks_cli.cluster_policies.api import ClusterPolicyApi

    Clusters API 2.0

    from databricks_cli.clusters.api import ClusterApi

    DBFS API 2.0

    from databricks_cli.dbfs.api import DbfsApi

    from databricks_cli.dbfs.dbfs_path import DbfsPath

    Groups API 2.0

    from databricks_cli.groups.api import GroupsApi

    Instance Pools API 2.0

    from databricks_cli.instance_pools.api import InstancePoolsApi

    Jobs API 2.1

    from databricks_cli.jobs.api import JobsApi ( 1 )

    from databricks_cli.runs.api import RunsApi ( 2 )

    Libraries API 2.0

    from databricks_cli.libraries.api import LibrariesApi

    Delta Live Tables API 2.0

    from databricks_cli.pipelines.api import PipelinesApi, LibraryObject

    Repos API 2.0

    from databricks_cli.repos.api import ReposApi

    Secrets API 2.0

    from databricks_cli.secrets.api import SecretApi

    Token API 2.0

    from databricks_cli.tokens.api import TokensApi

    Unity Catalog API 2.1

    from databricks_cli.unity_catalog.api import UnityCatalogApi

    Workspace API 2.0

    from databricks_cli.workspace.api import WorkspaceApi

    • ( 1 ) Required only for working with jobs.

    • ( 2 ) Required only for working with job runs.

    For example, to call the Clusters API 2.0, add the following code:

    from databricks_cli.clusters.api import ClusterApi
    
  4. Use the ApiClient class to authenticate with the Databricks REST API. Use the os library’s getenv function to get the workspace instance URL, for example https://1234567890123456.7.gcp.databricks.com and token values. The following example uses the variable name of api_client to represent an instance of the ApiClient class.

    api_client = ApiClient(
      host  = os.getenv('DATABRICKS_HOST'),
      token = os.getenv('DATABRICKS_TOKEN')
    )
    
  5. Initialize instances of the classes as needed to call the Databricks REST API after authenticating, for example:

    REST API

    Suggested class initialization statements

    Cluster Policies API 2.0

    cluster_policies_api = ClusterPolicyApi(api_client)

    Clusters API 2.0

    clusters_api = ClusterApi(api_client)

    DBFS API 2.0

    dbfs_api = DbfsApi(api_client)

    Groups API 2.0

    groups_api = GroupsApi(api_client)

    Instance Pools API 2.0

    instance_pools_api = InstancePoolsApi(api_client)

    Jobs API 2.1

    jobs_api = JobsApi(api_client) ( 1 )

    runs_api = RunsApi(api_client) ( 2 )

    Libraries API 2.0

    libraries_api = LibrariesApi(api_client)

    Delta Live Tables API 2.0

    pipelines_api = PipelinesApi(api_client)

    Repos API 2.0

    repos_api = ReposApi(api_client)

    Secrets API 2.0

    secrets_api = SecretApi(api_client)

    Token API 2.0

    tokens_api = TokensApi(api_client)

    Unity Catalog API 2.1

    unity_catalog_api = UnityCatalogApi(api_client)

    Workspace API 2.0

    workspace_api = WorkspaceApi(api_client)

    • ( 1 ) Required only for working with jobs.

    • ( 2 ) Required only for working with job runs.

    For example, to initialize an instance of the Clusters API 2.0, add the following code:

    clusters_api = ClusterApi(api_client)
    
  6. Call the class method as needed that corresponds to the Databricks REST API operation. To find the calling signature and usage notes for the method, see the documentation for the following modules in the legacy Databricks CLI source code.

    REST API

    Module documentation

    Cluster Policies API 2.0

    databricks_cli.cluster_policies.api

    Clusters API 2.0

    databricks_cli.clusters.api

    DBFS API 2.0

    databricks_cli.dbfs.api

    databricks_cli.dbfs.dbfs_path

    Groups API 2.0

    databricks_cli.groups.api

    Instance Pools API 2.0

    databricks_cli.instance_pools.api

    Jobs API 2.1

    databricks_cli.jobs.api ( 1 )

    databricks_cli.runs.api ( 2 )

    Libraries API 2.0

    databricks_cli.libraries.api

    Delta Live Tables API 2.0

    databricks_cli.databricks_cli.pipelines.api

    Repos API 2.0

    databricks_cli.repos.api

    Secrets API 2.0

    databricks_cli.secrets.api

    Token API 2.0

    databricks_cli.tokens.api

    Unity Catalog API 2.1

    databricks_cli.unity_catalog.api

    Workspace API 2.0

    databricks_cli.workspace.api

    • ( 1 ) Required only for working with jobs.

    • ( 2 ) Required only for working with job runs.

    For example, to use the Clusters API 2.0 to list available cluster names and their IDs in the workspace, add the following code:

    clusters_list = clusters_api.list_clusters()
    
    print("Cluster name, cluster ID")
    
    for cluster in clusters_list['clusters']:
      print(f"{cluster['cluster_name']}, {cluster['cluster_id']}")
    

The full code for the preceding instructions is as follows:

import os

from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.clusters.api import ClusterApi

api_client = ApiClient(
  host  = os.getenv('DATABRICKS_HOST'),
  token = os.getenv('DATABRICKS_TOKEN')
)

clusters_api   = ClusterApi(api_client)
clusters_list  = cluster_api.list_clusters()

print("Cluster name, cluster ID")

for cluster in clusters_list['clusters']:
  print(f"{cluster['cluster_name']}, {cluster['cluster_id']}")

Examples

The following examples show how to use the source code in the legacy Databricks CLI and Python to automate the Databricks REST API for some basic usage scenarios.

Download a file from a DBFS path

import os

from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.dbfs.api import DbfsApi
from databricks_cli.dbfs.dbfs_path import DbfsPath

api_client = ApiClient(
  host  = os.getenv('DATABRICKS_HOST'),
  token = os.getenv('DATABRICKS_TOKEN')
)

dbfs_source_file_path      = 'dbfs:/tmp/users/someone@example.com//hello-world.txt'
local_file_download_path   = './hello-world.txt'

dbfs_api  = DbfsApi(api_client)
dbfs_path = DbfsPath(dbfs_source_file_path)

# Download the workspace file locally.
dbfs_api.get_file(
  dbfs_path,
  local_file_download_path,
  overwrite = True
)

# Print the downloaded file's contents.
print(open(local_file_download_path, 'r').read())

Get information about a Delta Live Tables pipeline

import os

from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.pipelines.api import PipelinesApi

api_client = ApiClient(
  host  = os.getenv('DATABRICKS_HOST'),
  token = os.getenv('DATABRICKS_TOKEN')
)

pipelines_api = PipelinesApi(api_client)
pipelines_get = pipelines_api.get('1234a56b-c789-0123-d456-78901234e5f6')

print(f"Name:    {pipelines_get['name']}\n" \
      f"ID:      {pipelines_get['pipeline_id']}\n" \
      f"State:   {pipelines_get['state']}\n" \
      f"Creator: {pipelines_get['creator_user_name']}"
     )

Create a schema in Unity Catalog

import os

from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.unity_catalog.api import UnityCatalogApi

api_client = ApiClient(
  host  = os.getenv('DATABRICKS_HOST'),
  token = os.getenv('DATABRICKS_TOKEN')
)

unity_catalog_api = UnityCatalogApi(api_client)
catalog           = "main"
schema            = "my_schema"

# Create the schema (also known as a database) in the specified catalog.
unity_catalog_create_schema = unity_catalog_api.create_schema(
  catalog_name = catalog,
  schema_name  = schema,
  comment      = "This is my schema"
)

print(f"Schema:       {unity_catalog_create_schema['name']}\n" \
      f"Owner:        {unity_catalog_create_schema['owner']}\n" \
      f"Metastore ID: {unity_catalog_create_schema['metastore_id']}"
     )

# Delete the schema.
unity_catalog_api.delete_schema(f"{catalog}.{schema}")

List objects in a workspace path

import os

from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.workspace.api import WorkspaceApi, WorkspaceFileInfo

api_client = ApiClient(
  host  = os.getenv('DATABRICKS_HOST'),
  token = os.getenv('DATABRICKS_TOKEN')
)

workspace_api          = WorkspaceApi(api_client)
workspace_list_objects = workspace_api.list_objects('/Users/someone@example.com/')

for object in workspace_list_objects:
  print(
    object.to_row(
      is_long_form = True,
      is_absolute  = True
    )
  )