API examples

Note

The CLI feature is unavailable on Databricks on Google Cloud as of this release.

This article contains examples that demonstrate how to use the Databricks REST API.

In the following examples, replace <databricks-instance> with the workspace URL of your Databricks deployment.

Authentication

To learn how to authenticate to the REST API, review Authentication using Databricks personal access tokens.

The examples in this article assume you are using Databricks personal access tokens. In the following examples, replace <your-token> with your personal access token. The curl examples assume that you store Databricks API credentials under .netrc. The Python examples use Bearer authentication. Although the examples show storing the token in the code, for leveraging credentials safely in Databricks, we recommend that you follow the Secret management user guide.

Get a gzipped list of clusters

This example uses Databricks REST API version 2.0.

curl -n -H "Accept-Encoding: gzip" https://<databricks-instance>/api/2.0/clusters/list > clusters.gz

Create a Python 3 cluster

Note

Python 3 is the default version of Python in Databricks Runtime 6.0 and above.

The following example shows how to launch a Python 3 cluster using the Databricks REST API and the requests Python HTTP library. This example uses Databricks REST API version 2.0.

import requests

DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'

response = requests.post(
  'https://%s/api/2.0/clusters/create' % (DOMAIN),
  headers={'Authorization': 'Bearer %s' % TOKEN},
  json={
    "cluster_name": "my-cluster",
    "spark_version": "7.5.x-scala2.12",
    "node_type_id": "n1-highmem-4",
    "spark_env_vars": {
      "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "num_workers": 25
  }
)

if response.status_code == 200:
  print(response.json()['cluster_id'])
else:
  print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))

Jobs API examples

This section shows how to create Python, spark submit, and JAR jobs and run the JAR job and view its output.

Create a Python job

This example shows how to create a Python job. It uses the Apache Spark Python Spark Pi estimation. This example uses Databricks REST API version 2.0.

  1. Download the Python file containing the example and upload it to Databricks File System (DBFS) using the Databricks CLI.

    dbfs cp pi.py dbfs:/docs/pi.py
    
  2. Create the job.

    curl -n -X POST -H 'Content-Type: application/json' -d \
    '{
      "name": "SparkPi Python job",
      "new_cluster": {
        "spark_version": "7.5.x-scala2.12",
        "node_type_id": "n1-highmem-4",
        "num_workers": 2
      },
      "spark_python_task": {
        "python_file": "dbfs:/docs/pi.py",
        "parameters": [
          "10"
        ]
      }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

Create a spark-submit job

This example shows how to create a spark-submit job. It uses the Apache Spark SparkPi example and Databricks REST API version 2.0.

  1. Download the JAR containing the example and upload the JAR to Databricks File System (DBFS) using the Databricks CLI.

    dbfs cp SparkPi-assembly-0.1.jar dbfs:/docs/sparkpi.jar
    
  2. Create the job.

    curl -n \
    -X POST -H 'Content-Type: application/json' -d \
    '{
         "name": "SparkPi spark-submit job",
         "new_cluster": {
           "spark_version": "7.5.x-scala2.12",
           "node_type_id": "n1-highmem-4",
           "num_workers": 2
           },
        "spark_submit_task": {
           "parameters": [
             "--class",
             "org.apache.spark.examples.SparkPi",
             "dbfs:/docs/sparkpi.jar",
             "10"
             ]
           }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

Create and run a spark-submit job for R scripts

This example shows how to create a spark-submit job to run R scripts. This example uses Databricks REST API version 2.0.

  1. Upload the R file to Databricks File System (DBFS) using the Databricks CLI.

    dbfs cp your_code.R dbfs:/path/to/your_code.R
    

    If the code uses SparkR, it must first install the package. Databricks Runtime contains the SparkR source code. Install the SparkR package from its local directory as shown in the following example:

    install.packages("/databricks/spark/R/pkg", repos = NULL)
    library(SparkR)
    
    sparkR.session()
    n <- nrow(createDataFrame(iris))
    write.csv(n, "/dbfs/path/to/num_rows.csv")
    

    Databricks Runtime installs the latest version of sparklyr from CRAN. If the code uses sparklyr, You must specify the Spark master URL in spark_connect. To form the Spark master URL, use the SPARK_LOCAL_IP environment variable to get the IP, and use the default port 7077. For example:

    library(sparklyr)
    
    master <- paste("spark://", Sys.getenv("SPARK_LOCAL_IP"), ":7077", sep="")
    sc <- spark_connect(master)
    iris_tbl <- copy_to(sc, iris)
    write.csv(iris_tbl, "/dbfs/path/to/sparklyr_iris.csv")
    
  2. Create the job.

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{
         "name": "R script spark-submit job",
         "new_cluster": {
           "spark_version": "7.5.x-scala2.12",
           "node_type_id": "n1-highmem-4",
           "num_workers": 2
           },
        "spark_submit_task": {
           "parameters": [ "dbfs:/path/to/your_code.R" ]
           }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

    This returns a job-id that you can then use to run the job.

  3. Run the job using the job-id.

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{ "job_id": <job-id> }' https://<databricks-instance>/api/2.0/jobs/run-now
    

Create and run a JAR job

This example shows how to create and run a JAR job. It uses the Apache Spark SparkPi example and Databricks REST API version 2.0.

  1. Download the JAR containing the example.

  2. Upload the JAR to your Databricks instance using the API:

    curl -n \
    -F filedata=@"SparkPi-assembly-0.1.jar" \
    -F path="/docs/sparkpi.jar" \
    -F overwrite=true \
    https://<databricks-instance>/api/2.0/dbfs/put
    

    A successful call returns {}. Otherwise you will see an error message.

  1. Get a list of all Spark versions prior to creating your job.

    curl -n https://<databricks-instance>/api/2.0/clusters/spark-versions
    

    This example uses 7.3.x-scala2.12. See Runtime version strings for more information about Spark cluster versions.

  2. Create the job. The JAR is specified as a library and the main class name is referenced in the Spark JAR task.

    curl -n -X POST -H 'Content-Type: application/json' \
    -d '{
          "name": "SparkPi JAR job",
          "new_cluster": {
            "spark_version": "7.5.x-scala2.12",
            "node_type_id": "n1-highmem-4",
            "aws_attributes": {"availability": "ON_DEMAND"},
            "num_workers": 2
            },
         "libraries": [{"jar": "dbfs:/docs/sparkpi.jar"}],
         "spark_jar_task": {
            "main_class_name":"org.apache.spark.examples.SparkPi",
            "parameters": "10"
            }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

    This returns a job-id that you can then use to run the job.

  3. Run the job using run now:

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{ "job_id": <job-id> }' https://<databricks-instance>/api/2.0/jobs/run-now
    
  4. Navigate to https://<databricks-instance>/#job/<job-id> and you’ll be able to see your job running.

  5. You can also check on it from the API using the information returned from the previous request.

    curl -n https://<databricks-instance>/api/2.0/jobs/runs/get?run_id=<run-id> | jq
    

    Which should return something like:

    {
      "job_id": 35,
      "run_id": 30,
      "number_in_job": 1,
      "original_attempt_run_id": 30,
      "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": ""
      },
      "task": {
        "spark_jar_task": {
          "jar_uri": "",
          "main_class_name": "org.apache.spark.examples.SparkPi",
          "parameters": [
            "10"
          ],
          "run_as_repl": true
        }
      },
      "cluster_spec": {
        "new_cluster": {
          "spark_version": "7.3.x-scala2.12",
          "node_type_id": "<node-type>",
          "enable_elastic_disk": false,
          "num_workers": 1
        },
        "libraries": [
          {
            "jar": "dbfs:/docs/sparkpi.jar"
          }
        ]
      },
      "cluster_instance": {
        "cluster_id": "0412-165350-type465",
        "spark_context_id": "5998195893958609953"
      },
      "start_time": 1523552029282,
      "setup_duration": 211000,
      "execution_duration": 33000,
      "cleanup_duration": 2000,
      "trigger": "ONE_TIME",
      "creator_user_name": "...",
      "run_name": "SparkPi JAR job",
      "run_page_url": "<databricks-instance>/?o=3901135158661429#job/35/run/1",
      "run_type": "JOB_RUN"
    }
    
  6. To view the job output, visit the job run details page.

    Executing command, time = 1523552263909.
    Pi is roughly 3.13973913973914
    

Create cluster enabled for table access control example

To create a cluster enabled for table access control, specify the following spark_conf property in your request body. This example uses Databricks REST API version 2.0.

curl -X POST https://<databricks-instance>/api/2.0/clusters/create -d'
{
  "cluster_name": "my-cluster-from-api",
  "spark_version": "7.5.x-scala2.12",
  "node_type_id": "n1-highmem-4",
  "spark_conf": {
    "spark.databricks.acl.dfAclsEnabled":true,
    "spark.databricks.repl.allowedLanguages": "python,sql"
  },
  "num_workers": 1,
  "custom_tags":{
      "costcenter":"Tags",
      "applicationname":"Tags1"
  }
}'

Workspace examples

Here are some examples for using the Workspace API to list, get info about, create, delete, export, and import workspace objects.

List a notebook or a folder

The following cURL command lists a path in the workspace. This example uses Databricks REST API version 2.0.

curl -n -X GET -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/"
}' https://<databricks-instance>/api/2.0/workspace/list

The response should contain a list of statuses:

{
  "objects": [
    {
     "object_type": "DIRECTORY",
     "path": "/Users/user@example.com/folder"
    },
    {
     "object_type": "NOTEBOOK",
     "language": "PYTHON",
     "path": "/Users/user@example.com/notebook1"
    },
    {
     "object_type": "NOTEBOOK",
     "language": "SCALA",
     "path": "/Users/user@example.com/notebook2"
    }
  ]
}

If the path is a notebook, the response contains an array containing the status of the input notebook.

Get information about a notebook or a folder

The following cURL command gets the status of a path in the workspace. This example uses Databricks REST API version 2.0.

curl -n  -X GET -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/"
}' https://<databricks-instance>/api/2.0/workspace/get-status

The response should contain the status of the input path:

{
  "object_type": "DIRECTORY",
  "path": "/Users/user@example.com"
}

Create a folder

The following cURL command creates a folder. It creates the folder recursively like mkdir -p. If the folder already exists, it will do nothing and succeed. This example uses Databricks REST API version 2.0.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/new/folder"
}' https://<databricks-instance>/api/2.0/workspace/mkdirs

If the request succeeds, an empty JSON string will be returned.

Delete a notebook or folder

The following cURL command deletes a notebook or folder. You can enable recursive to recursively delete a non-empty folder. This example uses Databricks REST API version 2.0.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/new/folder",
  "recursive": "false"
}' https://<databricks-instance>/api/2.0/workspace/delete

If the request succeeds, an empty JSON string is returned.

Export a notebook or folder

The following cURL command exports a notebook. Notebooks can be exported in the following formats: SOURCE, HTML, JUPYTER, DBC. A folder can be exported only as DBC. This example uses Databricks REST API version 2.0.

curl -n  -X GET \
-d '{ "path": "/Users/user@example.com/notebook", "format": "SOURCE" }' \
https://<databricks-instance>/api/2.0/workspace/export

The response contains base64 encoded notebook content.

{
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg=="
}

Alternatively, you can download the exported notebook directly.

curl -n -X GET "https://<databricks-instance>/api/2.0/workspace/export?format=SOURCE&direct_download=true&path=/Users/user@example.com/notebook"

The response will be the exported notebook content.

Import a notebook or directory

The following cURL command imports a notebook in the workspace. Multiple formats (SOURCE, HTML, JUPYTER, DBC) are supported. If the format is SOURCE, you must specify language. The content parameter contains base64 encoded notebook content. You can enable overwrite to overwrite the existing notebook. This example uses Databricks REST API version 2.0.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/new-notebook",
  "format": "SOURCE",
  "language": "SCALA",
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg==",
  "overwrite": "false"
}' https://<databricks-instance>/api/2.0/workspace/import

If the request succeeds, an empty JSON string is returned.

Alternatively, you can import a notebook via multipart form post.

curl -n -X POST https://<databricks-instance>/api/2.0/workspace/import \
       -F path="/Users/user@example.com/new-notebook" -F format=SOURCE -F language=SCALA -F overwrite=true -F content=@notebook.scala