Jobs API 2.0

Important

This article documents the 2.0 version of the Jobs API. However, Databricks recommends using Jobs API 2.1 for new and existing clients and scripts. For details on the changes from the 2.0 to 2.1 versions, see Updating from Jobs API 2.0 to 2.1.

The Jobs API allows you to create, edit, and delete jobs. The maximum allowed size of a request to the Jobs API is 10MB.

For details about updates to the Jobs API that support orchestration of multiple tasks with Databricks jobs, see Updating from Jobs API 2.0 to 2.1.

Warning

You should never hard code secrets or store them in plain text. Use the Secrets API to manage secrets in the Databricks CLI. Use the Secrets utility (dbutils.secrets) to reference secrets in notebooks and jobs.

Note

If you receive a 500-level error when making Jobs API requests, Databricks recommends retrying requests for up to 10 min (with a minimum 30 second interval between retries).

Important

To access Databricks REST APIs, you must authenticate.

Create

Endpoint

HTTP Method

2.0/jobs/create

POST

Create a new job.

Example

This example creates a job that runs a JAR task at 10:15pm each night.

Request

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/create \
--data @create-job.json \
| jq .

create-job.json:

{
  "name": "Nightly model training",
  "new_cluster": {
    "spark_version": "7.5.x-scala2.12",
    "node_type_id": "n1-highmem-4",
    "num_workers": 10
  },
  "libraries": [
    {
      "jar": "dbfs:/my-jar.jar"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2"
      }
    }
  ],
  "timeout_seconds": 3600,
  "max_retries": 1,
  "schedule": {
    "quartz_cron_expression": "0 15 22 * * ?",
    "timezone_id": "America/Los_Angeles"
  },
  "spark_jar_task": {
    "main_class_name": "com.databricks.ComputeModels"
  }
}

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • The contents of create-job.json with fields that are appropriate for your solution.

This example uses a .netrc file and jq.

Response

{
  "job_id": 1
}

Request structure

Important

  • When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.

  • When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.

Field Name

Type

Description

existing_cluster_id OR new_cluster

STRING OR NewCluster

If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.

If new_cluster, a description of a cluster that will be created for each run.

If specifying a PipelineTask, this field can be empty.

notebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task OR pipeline_task OR run_job_task

NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask

If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task.

If spark_jar_task, indicates that this job should run a JAR.

If spark_python_task, indicates that this job should run a Python file.

If spark_submit_task, indicates that this job should be launched by the spark submit script.

If pipeline_task, indicates that this job should run a Delta Live Tables pipeline.

If run_job_task, indicates that this job should run another job.

name

STRING

An optional name for the job. The default value is Untitled.

libraries

An array of Library

An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list.

email_notifications

JobEmailNotifications

An optional set of email addresses notified when runs of this job begin and complete and when this job is deleted. The default behavior is to not send any emails.

webhook_notifications

WebhookNotifications

An optional set of system destinations to notify when runs of this job begin, complete, or fail.

notification_settings

JobNotificationSettings

Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this job.

timeout_seconds

INT32

An optional timeout applied to each run of this job. The default behavior is to have no timeout.

max_retries

INT32

An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry.

min_retry_interval_millis

INT32

An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried.

retry_on_timeout

BOOL

An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.

schedule

CronSchedule

An optional periodic schedule for this job. The default behavior is that the job runs when triggered by clicking Run Now in the Jobs UI or sending an API request to runNow.

max_concurrent_runs

INT32

An optional maximum allowed number of concurrent runs of the job.

Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters.

This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs.

This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run.

Response structure

Field Name

Type

Description

job_id

INT64

The canonical identifier for the newly created job.

List

Endpoint

HTTP Method

2.0/jobs/list

GET

List all jobs.

Example

Request

curl --netrc --request GET \
https://<databricks-instance>/api/2.0/jobs/list \
| jq .

Replace <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

This example uses a .netrc file and jq.

Response

{
  "jobs": [
    {
      "job_id": 1,
      "settings": {
        "name": "Nightly model training",
        "new_cluster": {
          "spark_version": "7.5.x-scala2.12",
          "node_type_id": "n1-highmem-4",
          "num_workers": 10
        },
        "libraries": [
          {
            "jar": "dbfs:/my-jar.jar"
          },
          {
            "maven": {
              "coordinates": "org.jsoup:jsoup:1.7.2"
            }
          }
        ],
        "timeout_seconds": 100000000,
        "max_retries": 1,
        "schedule": {
          "quartz_cron_expression": "0 15 22 * * ?",
          "timezone_id": "America/Los_Angeles",
          "pause_status": "UNPAUSED"
        },
        "spark_jar_task": {
          "main_class_name": "com.databricks.ComputeModels"
        }
      },
      "created_time": 1457570074236
    }
  ]
}

Response structure

Field Name

Type

Description

jobs

An array of Job

The list of jobs.

Delete

Endpoint

HTTP Method

2.0/jobs/delete

POST

Delete a job and send an email to the addresses specified in JobSettings.email_notifications. No action occurs if the job has already been removed. After the job is removed, neither its details nor its run history is visible in the Jobs UI or API. The job is guaranteed to be removed upon completion of this request. However, runs that were active before the receipt of this request may still be active. They will be terminated asynchronously.

Example

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/delete \
--data '{ "job_id": <job-id> }'

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <job-id> with the ID of the job, for example 123.

This example uses a .netrc file.

Request structure

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job to delete. This field is required.

Get

Endpoint

HTTP Method

2.0/jobs/get

GET

Retrieve information about a single job.

Example

Request

curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/get?job_id=<job-id>' \
| jq .

Or:

curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/get \
--data job_id=<job-id> \
| jq .

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <job-id> with the ID of the job, for example 123.

This example uses a .netrc file and jq.

Response

{
  "job_id": 1,
  "settings": {
    "name": "Nightly model training",
    "new_cluster": {
      "spark_version": "7.5.x-scala2.12",
      "node_type_id": "n1-highmem-4",
      "aws_attributes": {
        "availability": "ON_DEMAND"
      },
      "num_workers": 10
    },
    "libraries": [
      {
        "jar": "dbfs:/my-jar.jar"
      },
      {
        "maven": {
          "coordinates": "org.jsoup:jsoup:1.7.2"
        }
      }
    ],
    "email_notifications": {
      "on_start": [],
      "on_success": [],
      "on_failure": []
    },
    "webhook_notifications": {
      "on_start": [
        {
          "id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
        }
      ],
      "on_success": [
        {
          "id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
        }
      ],
      "on_failure": []
    },
    "timeout_seconds": 100000000,
    "max_retries": 1,
    "schedule": {
      "quartz_cron_expression": "0 15 22 * * ?",
      "timezone_id": "America/Los_Angeles",
      "pause_status": "UNPAUSED"
    },
    "spark_jar_task": {
      "main_class_name": "com.databricks.ComputeModels"
    }
  },
  "created_time": 1457570074236
}

Request structure

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job to retrieve information about. This field is required.

Response structure

Field Name

Type

Description

job_id

INT64

The canonical identifier for this job.

creator_user_name

STRING

The creator user name. This field won’t be included in the response if the user has been deleted.

settings

JobSettings

Settings for this job and all of its runs. These settings can be updated using the Reset or Update endpoints.

created_time

INT64

The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC).

Reset

Endpoint

HTTP Method

2.0/jobs/reset

POST

Overwrite all settings for a specific job. Use the Update endpoint to update job settings partially.

Example

This example request makes job 2 identical to job 1 in the create example.

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/reset \
--data @reset-job.json \
| jq .

reset-job.json:

{
  "job_id": 2,
  "new_settings": {
    "name": "Nightly model training",
    "new_cluster": {
      "spark_version": "7.5.x-scala2.12",
      "node_type_id": "n1-highmem-4",
      "aws_attributes": {
        "availability": "ON_DEMAND"
      },
      "num_workers": 10
    },
    "libraries": [
      {
        "jar": "dbfs:/my-jar.jar"
      },
      {
        "maven": {
          "coordinates": "org.jsoup:jsoup:1.7.2"
        }
      }
    ],
    "email_notifications": {
      "on_start": [],
      "on_success": [],
      "on_failure": []
    },
    "webhook_notifications": {
      "on_start": [
        {
          "id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
        }
      ],
      "on_success": [
        {
          "id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
        }
      ],
      "on_failure": []
    },
    "timeout_seconds": 100000000,
    "max_retries": 1,
    "schedule": {
      "quartz_cron_expression": "0 15 22 * * ?",
      "timezone_id": "America/Los_Angeles",
      "pause_status": "UNPAUSED"
    },
    "spark_jar_task": {
      "main_class_name": "com.databricks.ComputeModels"
    }
  }
}

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • The contents of reset-job.json with fields that are appropriate for your solution.

This example uses a .netrc file and jq.

Request structure

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job to reset. This field is required.

new_settings

JobSettings

The new settings of the job. These settings completely replace the old settings.

Changes to the field JobSettings.timeout_seconds are applied to active runs. Changes to other fields are applied to future runs only.

Update

Endpoint

HTTP Method

2.0/jobs/update

POST

Add, change, or remove specific settings of an existing job. Use the Reset endpoint to overwrite all job settings.

Example

This example request removes libraries and adds email notification settings to job 1 defined in the create example.

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/update \
--data @update-job.json \
| jq .

update-job.json:

{
  "job_id": 1,
  "new_settings": {
    "existing_cluster_id": "1201-my-cluster",
    "email_notifications": {
      "on_start": [ "someone@example.com" ],
      "on_success": [],
      "on_failure": []
    }
  },
  "fields_to_remove": ["libraries"]
}

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • The contents of update-job.json with fields that are appropriate for your solution.

This example uses a .netrc file and jq.

Request structure

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job to update. This field is required.

new_settings

JobSettings

The new settings for the job.

Top-level fields specified in new_settings, except for arrays, are completely replaced. Arrays are merged based on the respective key fields, such as task_key or job_cluster_key, and array entries with the same key are completely replaced. Except for array merging, partially updating nested fields is not supported.

Changes to the field JobSettings.timeout_seconds are applied to active runs. Changes to other fields are applied to future runs only.

fields_to_remove

An array of STRING

Remove top-level fields in the job settings. Removing nested fields is not supported, except for entries from the tasks and job_clusters arrays. For example, the following is a valid argument for this field: ["libraries", "schedule", "tasks/task_1", "job_clusters/Default"]

This field is optional.

Run now

Important

  • A workspace is limited to 1000 concurrent task runs. A 429 Too Many Requests response is returned when you request a run that cannot start immediately.

  • The number of jobs a workspace can create in an hour is limited to 10000 (includes “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.

  • A workspace can contain up to 12000 saved jobs.

  • A job can contain up to 100 tasks.

Endpoint

HTTP Method

2.0/jobs/run-now

POST

Run a job now and return the run_id of the triggered run.

Tip

If you invoke Create together with Run now, you can use the Runs submit endpoint instead, which allows you to submit your workload directly without having to create a job.

Example

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/run-now \
--data @run-job.json \
| jq .

run-job.json:

An example request for a notebook job:

{
  "job_id": 1,
  "notebook_params": {
    "name": "john doe",
    "age": "35"
  }
}

An example request for a JAR job:

{
  "job_id": 2,
  "jar_params": [ "john doe", "35" ]
}

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • The contents of run-job.json with fields that are appropriate for your solution.

This example uses a .netrc file and jq.

Request structure

Field Name

Type

Description

job_id

INT64

jar_params

An array of STRING

A list of parameters for jobs with JAR tasks, e.g. "jar_params": ["john doe", "35"]. The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon run-now, it will default to an empty list. jar_params cannot be specified in conjunction with notebook_params. The JSON representation of this field (i.e. {"jar_params":["john doe","35"]}) cannot exceed 10,000 bytes.

notebook_params

A map of ParamPair

A map from keys to values for jobs with notebook task, e.g. "notebook_params": {"name": "john doe", "age":  "35"}. The map is passed to the notebook and is accessible through the dbutils.widgets.get function.

If not specified upon run-now, the triggered run uses the job’s base parameters.

You cannot specify notebook_params in conjunction with jar_params.

The JSON representation of this field (i.e. {"notebook_params":{"name":"john doe","age":"35"}}) cannot exceed 10,000 bytes.

python_params

An array of STRING

A list of parameters for jobs with Python tasks, e.g. "python_params": ["john doe", "35"]. The parameters will be passed to Python file as command-line parameters. If specified upon run-now, it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. {"python_params":["john doe","35"]}) cannot exceed 10,000 bytes.

spark_submit_params

An array of STRING

A list of parameters for jobs with spark submit task, e.g. "spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]. The parameters will be passed to spark-submit script as command-line parameters. If specified upon run-now, it would overwrite the parameters specified in job setting. The JSON representation of this field cannot exceed 10,000 bytes.

idempotency_token

STRING

An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned.

If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token.

This token must have at most 64 characters.

For example, "idempotency_token": "123".

Response structure

Field Name

Type

Description

run_id

INT64

The globally unique ID of the newly triggered run.

number_in_job

INT64

The sequence number of this run among all runs of the job.

Runs submit

Important

  • A workspace is limited to 1000 concurrent task runs. A 429 Too Many Requests response is returned when you request a run that cannot start immediately.

  • The number of jobs a workspace can create in an hour is limited to 10000 (includes “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.

  • A workspace can contain up to 12000 saved jobs.

  • A job can contain up to 100 tasks.

Endpoint

HTTP Method

2.0/jobs/runs/submit

POST

Submit a one-time run. This endpoint allows you to submit a workload directly without creating a job. Use the jobs/runs/get API to check the run state after the job is submitted.

Example

Request

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/submit \
--data @submit-job.json \
| jq .

submit-job.json:

{
  "run_name": "my spark task",
  "new_cluster": {
    "spark_version": "7.5.x-scala2.12",
    "node_type_id": "n1-highmem-4",
    "aws_attributes": {
      "availability": "ON_DEMAND"
    },
    "num_workers": 10
  },
  "libraries": [
    {
      "jar": "dbfs:/my-jar.jar"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2"
      }
    }
  ],
  "spark_jar_task": {
    "main_class_name": "com.databricks.ComputeModels"
  }
}

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • The contents of submit-job.json with fields that are appropriate for your solution.

This example uses a .netrc file and jq.

Response

{
  "run_id": 123
}

Request structure

Important

  • When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.

  • When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.

Field Name

Type

Description

existing_cluster_id OR new_cluster

STRING OR NewCluster

If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.

If new_cluster, a description of a cluster that will be created for each run.

If specifying a PipelineTask, then this field can be empty.

notebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task OR pipeline_task OR run_job_task

NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask

If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task.

If spark_jar_task, indicates that this job should run a JAR.

If spark_python_task, indicates that this job should run a Python file.

If spark_submit_task, indicates that this job should be launched by the spark submit script.

If pipeline_task, indicates that this job should run a Delta Live Tables pipeline.

If run_job_task, indicates that this job should run another job.

run_name

STRING

An optional name for the run. The default value is Untitled.

webhook_notifications

WebhookNotifications

An optional set of system destinations to notify when runs of this job begin, complete, or fail.

notification_settings

JobNotificationSettings

Optional notification settings that are used when sending notifications to each of the webhook_notifications for this run.

libraries

An array of Library

An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list.

timeout_seconds

INT32

An optional timeout applied to each run of this job. The default behavior is to have no timeout.

idempotency_token

STRING

An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned.

If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token.

This token must have at most 64 characters.

For example, "idempotency_token": "123".

Response structure

Field Name

Type

Description

run_id

INT64

The canonical identifier for the newly submitted run.

Runs list

Endpoint

HTTP Method

2.0/jobs/runs/list

GET

List runs in descending order by start time.

Note

Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Jobs API, see Runs export.

Example

Request

curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/list?job_id=<job-id>&active_only=<true-false>&offset=<offset>&limit=<limit>&run_type=<run-type>' \
| jq .

Or:

curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/list \
--data 'job_id=<job-id>&active_only=<true-false>&offset=<offset>&limit=<limit>&run_type=<run-type>' \
| jq .

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <job-id> with the ID of the job, for example 123.

  • <true-false> with true or false”.

  • <offset> with the offset value.

  • <limit> with the limit value.

  • <run-type> with the run_type value.

This example uses a .netrc file and jq.

Response

{
  "runs": [
    {
      "job_id": 1,
      "run_id": 452,
      "number_in_job": 5,
      "state": {
        "life_cycle_state": "RUNNING",
        "state_message": "Performing action"
      },
      "task": {
        "notebook_task": {
          "notebook_path": "/Users/donald@duck.com/my-notebook"
        }
      },
      "cluster_spec": {
        "existing_cluster_id": "1201-my-cluster"
      },
      "cluster_instance": {
        "cluster_id": "1201-my-cluster",
        "spark_context_id": "1102398-spark-context-id"
      },
      "overriding_parameters": {
        "jar_params": ["param1", "param2"]
      },
      "start_time": 1457570074236,
      "end_time": 1457570075149,
      "setup_duration": 259754,
      "execution_duration": 3589020,
      "cleanup_duration": 31038,
      "run_duration": 3879812,
      "trigger": "PERIODIC"
    }
  ],
  "has_more": true
}

Request structure

Field Name

Type

Description

active_only OR completed_only

BOOL OR BOOL

If active_only is true, only active runs are included in the results; otherwise, lists both active and completed runs. An active run is a run in the PENDING, RUNNING, or TERMINATING RunLifecycleState. This field cannot be true when completed_only is true.

If completed_only is true, only completed runs are included in the results; otherwise, lists both active and completed runs. This field cannot be true when active_only is true.

job_id

INT64

The job for which to list runs. If omitted, the Jobs service will list runs from all jobs.

offset

INT32

The offset of the first run to return, relative to the most recent run.

limit

INT32

The number of runs to return. This value should be greater than 0 and less than 1000. The default value is 20. If a request specifies a limit of 0, the service will instead use the maximum limit.

run_type

STRING

The type of runs to return. For a description of run types, see Run.

Response structure

Field Name

Type

Description

runs

An array of Run

A list of runs, from most recently started to least.

has_more

BOOL

If true, additional runs matching the provided filter are available for listing.

Runs get

Endpoint

HTTP Method

2.0/jobs/runs/get

GET

Retrieve the metadata of a run.

Note

Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Jobs API, see Runs export.

Example

Request

curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/get?run_id=<run-id>' \
| jq .

Or:

curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/get \
--data run_id=<run-id> \
| jq .

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <run-id> with the ID of the run, for example 123.

This example uses a .netrc file and jq.

Response

{
  "job_id": 1,
  "run_id": 452,
  "number_in_job": 5,
  "state": {
    "life_cycle_state": "RUNNING",
    "state_message": "Performing action"
  },
  "task": {
    "notebook_task": {
      "notebook_path": "/Users/someone@example.com/my-notebook"
    }
  },
  "cluster_spec": {
    "existing_cluster_id": "1201-my-cluster"
  },
  "cluster_instance": {
    "cluster_id": "1201-my-cluster",
    "spark_context_id": "1102398-spark-context-id"
  },
  "overriding_parameters": {
    "jar_params": ["param1", "param2"]
  },
  "start_time": 1457570074236,
  "end_time": 1457570075149,
  "setup_duration": 259754,
  "execution_duration": 3589020,
  "cleanup_duration": 31038,
  "run_duration": 3879812,
  "trigger": "PERIODIC"
}

Request structure

Field Name

Type

Description

run_id

INT64

The canonical identifier of the run for which to retrieve the metadata. This field is required.

Response structure

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job that contains this run.

run_id

INT64

The canonical identifier of the run. This ID is unique across all runs of all jobs.

number_in_job

INT64

The sequence number of this run among all runs of the job. This value starts at 1.

original_attempt_run_id

INT64

If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id.

state

RunState

The result and lifecycle states of the run.

schedule

CronSchedule

The cron schedule that triggered this run if it was triggered by the periodic scheduler.

task

JobTask

The task performed by the run, if any.

cluster_spec

ClusterSpec

A snapshot of the job’s cluster specification when this run was created.

cluster_instance

ClusterInstance

The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run.

overriding_parameters

RunParameters

The parameters used for this run.

start_time

INT64

The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.

end_time

INT64

The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field will be set to 0 if the job is still running.

setup_duration

INT64

The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The total duration of the run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The setup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

execution_duration

INT64

The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The total duration of the run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The execution_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

cleanup_duration

INT64

The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The total duration of the run is the sum of the setup_duration, execution_duration, and the cleanup_duration. The cleanup_duration field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the run_duration field.

run_duration

INT64

The time in milliseconds it took the job run and all of its repairs to finish. This field is only set for multitask job runs and not task runs. The duration of a task run is the sum of the setup_duration, execution_duration, and the cleanup_duration.

trigger

TriggerType

The type of trigger that fired this run.

creator_user_name

STRING

The creator user name. This field won’t be included in the response if the user has been deleted

run_page_url

STRING

The URL to the detail page of the run.

Runs export

Endpoint

HTTP Method

2.0/jobs/runs/export

GET

Export and retrieve the job run task.

Note

Only notebook runs can be exported in HTML format. Exporting runs of other types will fail.

Example

Request

curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/export?run_id=<run-id>' \
| jq .

Or:

curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/export \
--data run_id=<run-id> \
| jq .

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <run-id> with the ID of the run, for example 123.

This example uses a .netrc file and jq.

Response

{
  "views": [ {
    "content": "<!DOCTYPE html><html><head>Head</head><body>Body</body></html>",
    "name": "my-notebook",
    "type": "NOTEBOOK"
  } ]
}

To extract the HTML notebook from the JSON response, download and run this Python script.

Note

The notebook body in the __DATABRICKS_NOTEBOOK_MODEL object is encoded.

Request structure

Field Name

Type

Description

run_id

INT64

The canonical identifier for the run. This field is required.

views_to_export

ViewsToExport

Which views to export (CODE, DASHBOARDS, or ALL). Defaults to CODE.

Response structure

Field Name

Type

Description

views

An array of ViewItem

The exported content in HTML format (one for every view item).

Runs cancel

Endpoint

HTTP Method

2.0/jobs/runs/cancel

POST

Cancel a job run. Because the run is canceled asynchronously, the run may still be running when this request completes. The run will be terminated shortly. If the run is already in a terminal life_cycle_state, this method is a no-op.

This endpoint validates that the run_id parameter is valid and for invalid parameters returns HTTP status code 400.

Example

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/cancel \
--data '{ "run_id": <run-id> }'

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <run-id> with the ID of the run, for example 123.

This example uses a .netrc file.

Request structure

Field Name

Type

Description

run_id

INT64

The canonical identifier of the run to cancel. This field is required.

Runs cancel all

Endpoint

HTTP Method

2.0/jobs/runs/cancel-all

POST

Cancel all active runs of a job. Because the run is canceled asynchronously, it doesn’t prevent new runs from being started.

This endpoint validates that the job_id parameter is valid and for invalid parameters returns HTTP status code 400.

Example

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/cancel-all \
--data '{ "job_id": <job-id> }'

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <job-id> with the ID of the job, for example 123.

This example uses a .netrc file.

Request structure

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job to cancel all runs of. This field is required.

Runs get output

Endpoint

HTTP Method

2.0/jobs/runs/get-output

GET

Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Databricks restricts this API to return the first 5 MB of the output. For returning a larger result, you can store job results in a cloud storage service.

This endpoint validates that the run_id parameter is valid and for invalid parameters returns HTTP status code 400.

Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Jobs API, see Runs export.

Example

Request

curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/get-output?run_id=<run-id>' \
| jq .

Or:

curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/get-output \
--data run_id=<run-id> \
| jq .

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <run-id> with the ID of the run, for example 123.

This example uses a .netrc file and jq.

Response

{
  "metadata": {
    "job_id": 1,
    "run_id": 452,
    "number_in_job": 5,
    "state": {
      "life_cycle_state": "TERMINATED",
      "result_state": "SUCCESS",
      "state_message": ""
    },
    "task": {
      "notebook_task": {
        "notebook_path": "/Users/someone@example.com/my-notebook"
      }
    },
    "cluster_spec": {
      "existing_cluster_id": "1201-my-cluster"
    },
    "cluster_instance": {
      "cluster_id": "1201-my-cluster",
      "spark_context_id": "1102398-spark-context-id"
    },
    "overriding_parameters": {
      "jar_params": ["param1", "param2"]
    },
    "start_time": 1457570074236,
    "setup_duration": 259754,
    "execution_duration": 3589020,
    "cleanup_duration": 31038,
    "run_duration": 3879812,
    "trigger": "PERIODIC"
  },
  "notebook_output": {
    "result": "the maybe truncated string passed to dbutils.notebook.exit()"
  }
}

Request structure

Field Name

Type

Description

run_id

INT64

The canonical identifier for the run. For a job with mulitple tasks, this is the run_id of a task run. See Runs get output. This field is required.

Response structure

Field Name

Type

Description

notebook_output OR error

NotebookOutput OR STRING

If notebook_output, the output of a notebook task, if available. A notebook task that terminates (either successfully or with a failure) without calling dbutils.notebook.exit() is considered to have an empty output. This field will be set but its result value will be empty.

If error, an error message indicating why output is not available. The message is unstructured, and its exact format is subject to change.

metadata

Run

All details of the run except for its output.

Runs delete

Endpoint

HTTP Method

2.0/jobs/runs/delete

POST

Delete a non-active run. Returns an error if the run is active.

Example

curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/delete \
--data '{ "run_id": <run-id> }'

Replace:

  • <databricks-instance> with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com.

  • <run-id> with the ID of the run, for example 123.

This example uses a .netrc file.

Request structure

Field Name

Type

Description

run_id

INT64

The canonical identifier of the run for which to retrieve the metadata.

Data structures

AutoScale

Range defining the min and max number of cluster workers.

Field Name

Type

Description

min_workers

INT32

The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.

max_workers

INT32

The maximum number of workers to which the cluster can scale up when overloaded. max_workers must be strictly greater than min_workers.

ClusterInstance

Identifiers for the cluster and Spark context used by a run. These two values together identify an execution context across all time.

Field Name

Type

Description

cluster_id

STRING

The canonical identifier for the cluster used by a run. This field is always available for runs on existing clusters. For runs on new clusters, it becomes available once the cluster is created. This value can be used to view logs by browsing to /#setting/sparkui/$cluster_id/driver-logs. The logs will continue to be available after the run completes.

The response won’t include this field if the identifier is not available yet.

spark_context_id

STRING

The canonical identifier for the Spark context used by a run. This field will be filled in once the run begins execution. This value can be used to view the Spark UI by browsing to /#setting/sparkui/$cluster_id/$spark_context_id. The Spark UI will continue to be available after the run has completed.

The response won’t include this field if the identifier is not available yet.

ClusterLogConf

Path to cluster log.

Field Name

Type

Description

DbfsStorageInfo

DBFS location of cluster log. Destination must be provided. For example, { "dbfs" : { "destination" : "dbfs:/home/cluster_log" } }

ClusterSpec

Important

  • When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.

  • When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.

Field Name

Type

Description

existing_cluster_id OR new_cluster

STRING OR NewCluster

If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.

If new_cluster, a description of a cluster that will be created for each run.

If specifying a PipelineTask, then this field can be empty.

libraries

An array of Library

An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list.

ClusterTag

Cluster tag definition.

CronSchedule

Field Name

Type

Description

quartz_cron_expression

STRING

A Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. This field is required.

timezone_id

STRING

A Java timezone ID. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone for details. This field is required.

pause_status

STRING

Indicate whether this schedule is paused or not. Either “PAUSED” or “UNPAUSED”.

DbfsStorageInfo

DBFS storage information.

Field Name

Type

Description

destination

STRING

DBFS destination. Example: dbfs:/my/path

GCSStorageInfo

Google Cloud Storage (GCS) storage information.

Field Name

Type

Description

destination

STRING

File destination. Example: gs://...

Google Cloud attributes

Attributes set during cluster creation related to Google Cloud.

Field Name

Type

Description

use_preemptible_executors

BOOL

Use preemptible executors.

google_service_account

STRING

Google service account email address that the cluster uses to authenticate with Google Identity. This field is used for authentication with the GCS and BigQuery data sources.

boot_disk_size

INT32

Size, in GB, of the disk allocated to each instance. This value must be between 100 - 4096.

Important

For use with GCS and BigQuery, your Google service account that you use to access the data source must be in the same project as the SA that you specified when setting up your Databricks account.

InitScriptInfo

Path to an init script.

Field Name

Type

Description

workspace OR dbfs (deprecated)

OR gcs

WorkspaceStorageInfo

DbfsStorageInfo (deprecated)

GCSStorageInfo

Workspace location of init script. Destination must be provided. For example, { "workspace" : { "destination" : "/Users/someone@domain.com/init_script.sh" } }

(Deprecated) DBFS location of init script. Destination must be provided. For example, { "dbfs" : { "destination" : "dbfs:/home/init_script" } }

Google Cloud Storage (GCS) location of init script. Destination must be provided. For example, { "gcs": { "destination" : "gs://..." } }

Job

Field Name

Type

Description

job_id

INT64

The canonical identifier for this job.

creator_user_name

STRING

The creator user name. This field won’t be included in the response if the user has already been deleted.

run_as

STRING

The user name that the job will run as. run_as is based on the current job settings, and is set to the creator of the job if job access control is disabled, or the is_owner permission if job access control is enabled.

settings

JobSettings

Settings for this job and all of its runs. These settings can be updated using the resetJob method.

created_time

INT64

The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC).

JobEmailNotifications

Important

The on_start, on_success, and on_failure fields accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

Field Name

Type

Description

on_start

An array of STRING

A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

on_success

An array of STRING

A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESSFUL result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent.

on_failure

An array of STRING

A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a SKIPPED, FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent.

on_duration_warning_threshold_exceeded

An array of STRING

An list of email addresses to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. If no rule for the RUN_DURATION_SECONDS metric is specified in the health field for the job, notifications are not sent.

no_alert_for_skipped_runs

BOOL

If true, do not send email to recipients specified in on_failure if the run is skipped.

Field Name

Type

Description

on_start

An array of Webhook

An optional list of system destinations to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the on_start property.

on_success

An array of Webhook

An optional list of system destinations to be notified when a run completes successfully. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESSFUL result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the on_success property.

on_failure

An array of Webhook

An optional list of system destinations to be notified when a run completes unsuccessfully. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a SKIPPED, FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the on_failure property.

on_duration_warning_threshold_exceeded

An array of Webhook

An optional list of system destinations to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. A maximum of 3 destinations can be specified for the on_duration_warning_threshold_exceeded property.

JobNotificationSettings

Field Name

Type

Description

no_alert_for_skipped_runs

BOOL

If true, do not send notifications to recipients specified in on_failure if the run is skipped.

no_alert_for_canceled_runs

BOOL

If true, do not send notifications to recipients specified in on_failure if the run is canceled.

alert_on_last_attempt

BOOL

If true, do not send notifications to recipients specified in on_start for the retried runs and do not send notifications to recipients specified in on_failure until the last retry of the run.

JobSettings

Important

  • When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.

  • When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.

Settings for a job. These settings can be updated using the resetJob method.

Field Name

Type

Description

existing_cluster_id OR new_cluster

STRING OR NewCluster

If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability.

If new_cluster, a description of a cluster that will be created for each run.

If specifying a PipelineTask, then this field can be empty.

notebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task OR pipeline_task OR run_job_task

NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask

If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task.

If spark_jar_task, indicates that this job should run a JAR.

If spark_python_task, indicates that this job should run a Python file.

If spark_submit_task, indicates that this job should be launched by the spark submit script.

If pipeline_task, indicates that this job should run a Delta Live Tables pipeline.

If run_job_task, indicates that this job should run another job.

name

STRING

An optional name for the job. The default value is Untitled.

libraries

An array of Library

An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list.

email_notifications

JobEmailNotifications

An optional set of email addresses that will be notified when runs of this job begin or complete as well as when this job is deleted. The default behavior is to not send any emails.

webhook_notifications

WebhookNotifications

An optional set of system destinations to notify when runs of this job begin, complete, or fail.

notification_settings

JobNotificationSettings

Optional notification settings that are used when sending notifications to each of the email_notifications and webhook_notifications for this job.

timeout_seconds

INT32

An optional timeout applied to each run of this job. The default behavior is to have no timeout.

max_retries

INT32

An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the FAILED result_state or INTERNAL_ERROR life_cycle_state. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry.

min_retry_interval_millis

INT32

An optional minimal interval in milliseconds between attempts. The default behavior is that unsuccessful runs are immediately retried.

retry_on_timeout

BOOL

An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout.

schedule

CronSchedule

An optional periodic schedule for this job. The default behavior is that the job will only run when triggered by clicking “Run Now” in the Jobs UI or sending an API request to runNow.

max_concurrent_runs

INT32

An optional maximum allowed number of concurrent runs of the job.

Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters.

This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs will be skipped unless there are fewer than 3 active runs.

This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run.

health

JobsHealthRules

An optional set of health rules defined for the job.

JobTask

Field Name

Type

Description

notebook_task OR spark_jar_task OR spark_python_task OR spark_submit_task OR pipeline_task OR run_job_task

NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask

If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task.

If spark_jar_task, indicates that this job should run a JAR.

If spark_python_task, indicates that this job should run a Python file.

If spark_submit_task, indicates that this job should be launched by the spark submit script.

If pipeline_task, indicates that this job should run a Delta Live Tables pipeline.

If run_job_task, indicates that this job should run another job.

JobsHealthRule

Field Name

Type

Description

metric

STRING

Specifies the health metric that is being evaluated for a particular health rule. Valid values are RUN_DURATION_SECONDS.

operator

STRING

Specifies the operator used to compare the health metric value with the specified threshold. Valid values are GREATER_THAN.

value

INT32

Specifies the threshold value that the health metric should meet to comply with the health rule.

JobsHealthRules

Field Name

Type

Description

rules

An array of JobsHealthRule

An optional set of health rules that can be defined for a job.

Library

Field Name

Type

Description

jar OR egg OR whl OR pypi OR maven OR cran

STRING OR STRING OR STRING OR PythonPyPiLibrary OR MavenLibrary OR RCranLibrary

If jar, URI of the JAR to be installed. DBFS and GCS (gs) URIs are supported. For example: { "jar": "dbfs:/mnt/databricks/library.jar"} or { "jar": "gs://my-bucket/library.jar" }. If GCS is used, make sure the cluster has read access on the library.

If egg, URI of the egg to be installed. DBFS and GCS URIs are supported. For example: { "egg": "dbfs:/my/egg" } or { "egg": "gs://my-bucket/egg" }.

If whl, URI of the wheel or zipped wheels to be installed. DBFS and GCS URIs are supported. For example: { "whl": "dbfs:/my/whl" } or { "whl": "gs://my-bucket/whl" }. If GCS is used, make sure the cluster has read access on the library. Also the wheel file name needs to use the correct convention. If zipped wheels are to be installed, the file name suffix should be .wheelhouse.zip.

If pypi, specification of a PyPI library to be installed. Specifying the repo field is optional and if not specified, the default pip index is used. For example: { "package": "simplejson", "repo": "https://my-repo.com" }

If maven, specification of a Maven library to be installed. For example: { "coordinates": "org.jsoup:jsoup:1.7.2" }

If cran, specification of a CRAN library to be installed.

MavenLibrary

Field Name

Type

Description

coordinates

STRING

Gradle-style Maven coordinates. For example: org.jsoup:jsoup:1.7.2. This field is required.

repo

STRING

Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched.

exclusions

An array of STRING

List of dependences to exclude. For example: ["slf4j:slf4j", "*:hadoop-client"].

Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

NewCluster

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For example, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in spark_info will gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, the required parameters to automatically scale clusters up and down based on load.

spark_version

STRING

The Spark version of the cluster. A list of available Spark versions can be retrieved by using the GET 2.0/clusters/spark-versions call. This field is required.

spark_conf

SparkConfPair

An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}

gcp_attributes

Google Cloud attributes

Attributes related to clusters running on Google Cloud. If not specified at cluster creation, a set of default values will be used.

node_type_id

STRING

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads A list of available node types can be retrieved by using the GET 2.0/clusters/list-node-types call. This field, the instance_pool_id field, or a cluster policy that specifies a node type ID or instance pool ID, is required.

driver_node_type_id

STRING

The node type of the Spark driver. This field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above.

ssh_public_keys

An array of STRING

Set to empty array. Cluster SSH is not supported.

custom_tags

ClusterTag

Always set to empty array.

cluster_log_conf

ClusterLogConf

The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is <destination>/<cluster-id>/driver, while the destination of executor logs is <destination>/<cluster-id>/executor.

init_scripts

An array of InitScriptInfo

The configuration for storing init scripts. Any number of scripts can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-id>/init_scripts.

spark_env_vars

SparkEnvPair

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pair of the form (X,Y) are exported as is (i.e., export X='Y') while launching the driver and workers.

To specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default databricks managed environmental variables are included as well.

Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

enable_elastic_disk

BOOL

Akways set to false.

instance_pool_id

STRING

The optional ID of the instance pool to use for cluster nodes. Refer to the Instance Pools API for details.

NotebookOutput

Field Name

Type

Description

result

STRING

The value passed to dbutils.notebook.exit(). Databricks restricts this API to return the first 1 MB of the value. For a larger result, your job can store the results in a cloud storage service. This field will be absent if dbutils.notebook.exit() was never called.

truncated

BOOLEAN

Whether or not the result was truncated.

NotebookTask

All the output cells are subject to the size of 8MB. If the output of a cell has a larger size, the rest of the run will be cancelled and the run will be marked as failed. In that case, some of the content output from other cells may also be missing.

Field Name

Type

Description

notebook_path

STRING

The absolute path of the notebook to be run in the Databricks workspace. This path must begin with a slash. This field is required.

revision_timestamp

LONG

The timestamp of the revision of the notebook.

base_parameters

A map of ParamPair

Base parameters to be used for each run of this job. If the run is initiated by a call to run-now with parameters specified, the two parameters maps will be merged. If the same key is specified in base_parameters and in run-now, the value from run-now will be used.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

If the notebook takes a parameter that is not specified in the job’s base_parameters or the run-now override parameters, the default value from the notebook will be used.

Retrieve these parameters in a notebook using dbutils.widgets.get.

ParamPair

Name-based parameters for jobs running notebook tasks.

Important

The fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

Type

Description

STRING

Parameter name. Pass to dbutils.widgets.get to retrieve the value.

STRING

Parameter value.

PipelineTask

Field Name

Type

Description

pipeline_id

STRING

The full name of the Delta Live Tables pipeline task to execute.

PythonPyPiLibrary

Field Name

Type

Description

package

STRING

The name of the PyPI package to install. An optional exact version specification is also supported. Examples: simplejson and simplejson==3.8.0. This field is required.

repo

STRING

The repository where the package can be found. If not specified, the default pip index is used.

RCranLibrary

Field Name

Type

Description

package

STRING

The name of the CRAN package to install. This field is required.

repo

STRING

The repository where the package can be found. If not specified, the default CRAN repo is used.

Run

All the information about a run except for its output. The output can be retrieved separately with the getRunOutput method.

Field Name

Type

Description

job_id

INT64

The canonical identifier of the job that contains this run.

run_id

INT64

The canonical identifier of the run. This ID is unique across all runs of all jobs.

creator_user_name

STRING

The creator user name. This field won’t be included in the response if the user has already been deleted.

number_in_job

INT64

The sequence number of this run among all runs of the job. This value starts at 1.

original_attempt_run_id

INT64

If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id.

state

RunState

The result and lifecycle states of the run.

schedule

CronSchedule

The cron schedule that triggered this run if it was triggered by the periodic scheduler.

task

JobTask

The task performed by the run, if any.

cluster_spec

ClusterSpec

A snapshot of the job’s cluster specification when this run was created.

cluster_instance

ClusterInstance

The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run.

overriding_parameters

RunParameters

The parameters used for this run.

start_time

INT64

The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued.

setup_duration

INT64

The time it took to set up the cluster in milliseconds. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short.

execution_duration

INT64

The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error.

cleanup_duration

INT64

The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The total duration of the run is the sum of the setup_duration, the execution_duration, and the cleanup_duration.

end_time

INT64

The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field will be set to 0 if the job is still running.

trigger

TriggerType

The type of trigger that fired this run.

run_name

STRING

An optional name for the run. The default value is Untitled. The maximum allowed length is 4096 bytes in UTF-8 encoding.

run_page_url

STRING

The URL to the detail page of the run.

run_type

STRING

The type of the run.

  • JOB_RUN - Normal job run. A run created with Run now.

  • WORKFLOW_RUN - Workflow run. A run created with dbutils.notebook.run.

  • SUBMIT_RUN - Submit run. A run created with Run now.

attempt_number

INT32

The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy (max_retries > 0), subsequent runs are created with an original_attempt_run_id of the original attempt’s ID and an incrementing attempt_number. Runs are retried only until they succeed, and the maximum attempt_number is the same as the max_retries value for the job.

RunJobTask

Field Name

Type

Description

job_id

INT32

Unique identifier of the job to run. This field is required.

RunLifeCycleState

The life cycle state of a run. Allowed state transitions are:

  • QUEUED -> PENDING

  • PENDING -> RUNNING -> TERMINATING -> TERMINATED

  • PENDING -> SKIPPED

  • PENDING -> INTERNAL_ERROR

  • RUNNING -> INTERNAL_ERROR

  • TERMINATING -> INTERNAL_ERROR

State

Description

QUEUED

The run has been triggered but is queued because it reached one of the following limits:

  • The maximum concurrent active runs in the workspace.

  • The maximum concurrent Run Job task runs in the workspace.

  • The maximum concurrent runs of the job.

The job or the run must have queuing enabled before it can reach this state.

PENDING

The run has been triggered. If the configured maximum concurrent runs of the job is already reached, the run will immediately transition into the SKIPPED state without preparing any resources. Otherwise, the preparation of the cluster and the execution is in process.

RUNNING

The task of this run is being executed.

TERMINATING

The task of this run has completed, and the cluster and execution context are being cleaned up.

TERMINATED

The task of this run has completed, and the cluster and execution context have been cleaned up. This state is terminal.

SKIPPED

This run was aborted because a previous run of the same job was already active. This state is terminal.

INTERNAL_ERROR

An exceptional state that indicates a failure in the Jobs service, such as network failure over a long period. If a run on a new cluster ends in the INTERNAL_ERROR state, the Jobs service terminates the cluster as soon as possible. This state is terminal.

RunParameters

Parameters for this run. Only one of jar_params, python_params, or notebook_params should be specified in the run-now request, depending on the type of job task. Jobs with Spark JAR task or Python task take a list of position-based parameters, and jobs with notebook tasks take a key value map.

Field Name

Type

Description

jar_params

An array of STRING

A list of parameters for jobs with Spark JAR tasks, e.g. "jar_params": ["john doe", "35"]. The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon run-now, it will default to an empty list. jar_params cannot be specified in conjunction with notebook_params. The JSON representation of this field (i.e. {"jar_params":["john doe","35"]}) cannot exceed 10,000 bytes.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

notebook_params

A map of ParamPair

A map from keys to values for jobs with notebook task, e.g. "notebook_params": {"name": "john doe", "age":  "35"}. The map is passed to the notebook and is accessible through the dbutils.widgets.get function.

If not specified upon run-now, the triggered run uses the job’s base parameters.

notebook_params cannot be specified in conjunction with jar_params.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

The JSON representation of this field (i.e. {"notebook_params":{"name":"john doe","age":"35"}}) cannot exceed 10,000 bytes.

python_params

An array of STRING

A list of parameters for jobs with Python tasks, e.g. "python_params": ["john doe", "35"]. The parameters are passed to Python file as command-line parameters. If specified upon run-now, it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. {"python_params":["john doe","35"]}) cannot exceed 10,000 bytes.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

Important

These parameters accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

spark_submit_params

An array of STRING

A list of parameters for jobs with spark submit task, e.g. "spark_submit_params": ["--class", "org.apache.spark.examples.SparkPi"]. The parameters are passed to spark-submit script as command-line parameters. If specified upon run-now, it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. {"python_params":["john doe","35"]}) cannot exceed 10,000 bytes.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

Important

These parameters accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

RunResultState

The result state of the run.

  • If life_cycle_state = TERMINATED: if the run had a task, the result is guaranteed to be available, and it indicates the result of the task.

  • If life_cycle_state = PENDING, RUNNING, or SKIPPED, the result state is not available.

  • If life_cycle_state = TERMINATING or lifecyclestate = INTERNAL_ERROR: the result state is available if the run had a task and managed to start it.

Once available, the result state never changes.

State

Description

SUCCESS

The task completed successfully.

FAILED

The task completed with an error.

TIMEDOUT

The run was stopped after reaching the timeout.

CANCELED

The run was canceled at user request.

RunState

Field Name

Type

Description

life_cycle_state

RunLifeCycleState

A description of a run’s current location in the run lifecycle. This field is always available in the response.

result_state

RunResultState

The result state of a run. If it is not available, the response won’t include this field. See RunResultState for details about the availability of result_state.

user_cancelled_or_timedout

BOOLEAN

Whether a run was canceled manually by a user or by the scheduler because the run timed out.

state_message

STRING

A descriptive message for the current state. This field is unstructured, and its exact format is subject to change.

SparkConfPair

Spark configuration key-value pairs.

Type

Description

STRING

A configuration property name.

STRING

The configuration property value.

SparkEnvPair

Spark environment variable key-value pairs.

Important

When specifying environment variables in a job cluster, the fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

Type

Description

STRING

An environment variable name.

STRING

The environment variable value.

SparkJarTask

Field Name

Type

Description

jar_uri

STRING

Deprecated since 04/2016. Provide a jar through the libraries field instead. For an example, see Create.

main_class_name

STRING

The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library.

The code should use SparkContext.getOrCreate to obtain a Spark context; otherwise, runs of the job will fail.

parameters

An array of STRING

Parameters passed to the main method.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

SparkPythonTask

Field Name

Type

Description

python_file

STRING

The URI of the Python file to be executed. DBFS paths are supported. This field is required.

parameters

An array of STRING

Command line parameters passed to the Python file.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

SparkSubmitTask

Important

  • You can invoke Spark submit tasks only on new clusters.

  • In the new_cluster specification, libraries and spark_conf are not supported. Instead, use --jars and --py-files to add Java and Python libraries and --conf to set the Spark configuration.

  • master, deploy-mode, and executor-cores are automatically configured by Databricks; you cannot specify them in parameters.

  • By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set --driver-memory, and --executor-memory to a smaller value to leave some room for off-heap usage.

  • The --jars, --py-files, --files arguments support DBFS paths.

For example, assuming the JAR is uploaded to DBFS, you can run SparkPi by setting the following parameters.

{
  "parameters": [
    "--class",
    "org.apache.spark.examples.SparkPi",
    "dbfs:/path/to/examples.jar",
    "10"
  ]
}

Field Name

Type

Description

parameters

An array of STRING

Command-line parameters passed to spark submit.

Use Pass context about job runs into job tasks to set parameters containing information about job runs.

TriggerType

These are the type of triggers that can fire a run.

Type

Description

PERIODIC

Schedules that periodically trigger runs, such as a cron scheduler.

ONE_TIME

One time triggers that fire a single run. This occurs you triggered a single run on demand through the UI or the API.

RETRY

Indicates a run that is triggered as a retry of a previously failed run. This occurs when you request to re-run the job in case of failures.

ViewItem

The exported content is in HTML format. For example, if the view to export is dashboards, one HTML string is returned for every dashboard.

Field Name

Type

Description

content

STRING

Content of the view.

name

STRING

Name of the view item. In the case of code view, the notebook’s name. In the case of dashboard view, the dashboard’s name.

type

ViewType

Type of the view item.

ViewType

Type

Description

NOTEBOOK

Notebook view item.

DASHBOARD

Dashboard view item.

ViewsToExport

View to export: either code, all dashboards, or all.

Type

Description

CODE

Code view of the notebook.

DASHBOARDS

All dashboard views of the notebook.

ALL

All views of the notebook.

Webhook

Field Name

Type

Description

id

STRING

Identifier referencing a system notification destination. This field is required.

WebhookNotifications

Field Name

Type

Description

on_start

An array of Webhook

An optional list of system destinations to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the on_start property.

on_success

An array of Webhook

An optional list of system destinations to be notified when a run completes successfully. A run is considered to have completed successfully if it ends with a TERMINATED life_cycle_state and a SUCCESSFUL result_state. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the on_success property.

on_failure

An array of Webhook

An optional list of system destinations to be notified when a run completes unsuccessfully. A run is considered to have completed unsuccessfully if it ends with an INTERNAL_ERROR life_cycle_state or a SKIPPED, FAILED, or TIMED_OUT result_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the on_failure property.

on_duration_warning_threshold_exceeded

An array of Webhook

An optional list of system destinations to be notified when the duration of a run exceeds the threshold specified for the RUN_DURATION_SECONDS metric in the health field. A maximum of 3 destinations can be specified for the on_duration_warning_threshold_exceeded property.

WorkspaceStorageInfo

Workspace storage information.

Field Name

Type

Description

destination

STRING

File destination. Example: /Users/someone@domain.com/init_script.sh