Jobs API 2.0
Important
This article documents the 2.0 version of the Jobs API. However, Databricks recommends using Jobs API 2.1 for new and existing clients and scripts. For details on the changes from the 2.0 to 2.1 versions, see Updating from Jobs API 2.0 to 2.1.
The Jobs API allows you to create, edit, and delete jobs. The maximum allowed size of a request to the Jobs API is 10MB.
For details about updates to the Jobs API that support orchestration of multiple tasks with Databricks jobs, see Updating from Jobs API 2.0 to 2.1.
Warning
You should never hard code secrets or store them in plain text. Use the Secrets API to manage secrets in the Databricks CLI. Use the Secrets utility (dbutils.secrets) to reference secrets in notebooks and jobs.
Note
If you receive a 500-level error when making Jobs API requests, Databricks recommends retrying requests for up to 10 min (with a minimum 30 second interval between retries).
Important
To access Databricks REST APIs, you must authenticate.
Create
Endpoint |
HTTP Method |
---|---|
|
|
Create a new job.
Example
This example creates a job that runs a JAR task at 10:15pm each night.
Request
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/create \
--data @create-job.json \
| jq .
create-job.json
:
{
"name": "Nightly model training",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"node_type_id": "n1-highmem-4",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"timeout_seconds": 3600,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 * * ?",
"timezone_id": "America/Los_Angeles"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
}
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.The contents of
create-job.json
with fields that are appropriate for your solution.
Request structure
Important
When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
Field Name |
Type |
Description |
---|---|---|
|
|
If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new_cluster, a description of a cluster that will be created for each run. If specifying a PipelineTask, this field can be empty. |
|
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask |
If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task. If spark_jar_task, indicates that this job should run a JAR. If spark_python_task, indicates that this job should run a Python file. If spark_submit_task, indicates that this job should be launched by the spark submit script. If pipeline_task, indicates that this job should run a Delta Live Tables pipeline. If run_job_task, indicates that this job should run another job. |
|
|
An optional name for the job. The default value is |
|
An array of Library |
An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
|
An optional set of email addresses notified when runs of this job begin and complete and when this job is deleted. The default behavior is to not send any emails. |
|
|
An optional set of system destinations to notify when runs of this job begin, complete, or fail. |
|
|
Optional notification settings that are used when sending notifications
to each of the |
|
|
|
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
|
|
An optional maximum number of times to retry an unsuccessful run. A run
is considered to be
unsuccessful if it completes with the |
|
|
An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried. |
|
|
An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |
|
An optional periodic schedule for this job. The default behavior is that
the job runs
when triggered by clicking Run now in the Jobs UI or sending
an API request to |
|
|
|
An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
List
Endpoint |
HTTP Method |
---|---|
|
|
List all jobs.
Example
Request
curl --netrc --request GET \
https://<databricks-instance>/api/2.0/jobs/list \
| jq .
Replace <databricks-instance>
with the Databricks workspace instance name, for example 1234567890123456.7.gcp.databricks.com
.
Response
{
"jobs": [
{
"job_id": 1,
"settings": {
"name": "Nightly model training",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"node_type_id": "n1-highmem-4",
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"timeout_seconds": 100000000,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 * * ?",
"timezone_id": "America/Los_Angeles",
"pause_status": "UNPAUSED"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
},
"created_time": 1457570074236
}
]
}
Response structure
Field Name |
Type |
Description |
---|---|---|
|
An array of Job |
The list of jobs. |
Delete
Endpoint |
HTTP Method |
---|---|
|
|
Delete a job and send an email to the addresses specified in JobSettings.email_notifications
. No action occurs if the job has already been removed. After the job is removed, neither its details nor its run history is visible in the Jobs UI or API. The job is guaranteed to be removed upon completion of this request. However, runs that were active before the receipt of this request may still be active. They will be terminated asynchronously.
Example
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/delete \
--data '{ "job_id": <job-id> }'
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<job-id>
with the ID of the job, for example123
.
This example uses a .netrc file.
Get
Endpoint |
HTTP Method |
---|---|
|
|
Retrieve information about a single job.
Example
Request
curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/get?job_id=<job-id>' \
| jq .
Or:
curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/get \
--data job_id=<job-id> \
| jq .
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<job-id>
with the ID of the job, for example123
.
Response
{
"job_id": 1,
"settings": {
"name": "Nightly model training",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"node_type_id": "n1-highmem-4",
"aws_attributes": {
"availability": "ON_DEMAND"
},
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"email_notifications": {
"on_start": [],
"on_success": [],
"on_failure": []
},
"webhook_notifications": {
"on_start": [
{
"id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
}
],
"on_success": [
{
"id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
}
],
"on_failure": []
},
"timeout_seconds": 100000000,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 * * ?",
"timezone_id": "America/Los_Angeles",
"pause_status": "UNPAUSED"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
},
"created_time": 1457570074236
}
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier of the job to retrieve information about. This field is required. |
Response structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier for this job. |
|
|
The creator user name. This field won’t be included in the response if the user has been deleted. |
|
Settings for this job and all of its runs. These settings can be updated using the Reset or Update endpoints. |
|
|
|
The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). |
Reset
Endpoint |
HTTP Method |
---|---|
|
|
Overwrite all settings for a specific job. Use the Update endpoint to update job settings partially.
Example
This example request makes job 2 identical to job 1 in the create example.
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/reset \
--data @reset-job.json \
| jq .
reset-job.json
:
{
"job_id": 2,
"new_settings": {
"name": "Nightly model training",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"node_type_id": "n1-highmem-4",
"aws_attributes": {
"availability": "ON_DEMAND"
},
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"email_notifications": {
"on_start": [],
"on_success": [],
"on_failure": []
},
"webhook_notifications": {
"on_start": [
{
"id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
}
],
"on_success": [
{
"id": "bf2fbd0a-4a05-4300-98a5-303fc8132233"
}
],
"on_failure": []
},
"timeout_seconds": 100000000,
"max_retries": 1,
"schedule": {
"quartz_cron_expression": "0 15 22 * * ?",
"timezone_id": "America/Los_Angeles",
"pause_status": "UNPAUSED"
},
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
}
}
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.The contents of
reset-job.json
with fields that are appropriate for your solution.
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier of the job to reset. This field is required. |
|
The new settings of the job. These settings completely replace the old settings. Changes to the field |
Update
Endpoint |
HTTP Method |
---|---|
|
|
Add, change, or remove specific settings of an existing job. Use the Reset endpoint to overwrite all job settings.
Example
This example request removes libraries and adds email notification settings to job 1 defined in the create example.
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/update \
--data @update-job.json \
| jq .
update-job.json
:
{
"job_id": 1,
"new_settings": {
"existing_cluster_id": "1201-my-cluster",
"email_notifications": {
"on_start": [ "someone@example.com" ],
"on_success": [],
"on_failure": []
}
},
"fields_to_remove": ["libraries"]
}
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.The contents of
update-job.json
with fields that are appropriate for your solution.
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier of the job to update. This field is required. |
|
The new settings for the job. Top-level fields specified in Changes to the field |
|
|
An array of |
Remove top-level fields in the job settings. Removing nested fields is not supported,
except for entries from the This field is optional. |
Run now
Important
A workspace is limited to 1000 concurrent task runs. A
429 Too Many Requests
response is returned when you request a run that cannot start immediately.The number of jobs a workspace can create in an hour is limited to 10000 (includes “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.
A workspace can contain up to 12000 saved jobs.
A job can contain up to 100 tasks.
Endpoint |
HTTP Method |
---|---|
|
|
Run a job now and return the run_id
of the triggered run.
Tip
If you invoke Create together with Run now, you can use the Runs submit endpoint instead, which allows you to submit your workload directly without having to create a job.
Example
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/run-now \
--data @run-job.json \
| jq .
run-job.json
:
An example request for a notebook job:
{
"job_id": 1,
"notebook_params": {
"name": "john doe",
"age": "35"
}
}
An example request for a JAR job:
{
"job_id": 2,
"jar_params": [ "john doe", "35" ]
}
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.The contents of
run-job.json
with fields that are appropriate for your solution.
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
|
|
An array of |
A list of parameters for jobs with JAR tasks, e.g. |
|
A map of ParamPair |
A map from keys to values for jobs with notebook task, e.g.
If not specified upon You cannot specify notebook_params in conjunction with jar_params. The JSON representation of this field (i.e.
|
|
An array of |
A list of parameters for jobs with Python tasks, e.g. |
|
An array of |
A list of parameters for jobs with spark submit task, e.g.
|
|
|
An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For example, |
Runs submit
Important
A workspace is limited to 1000 concurrent task runs. A
429 Too Many Requests
response is returned when you request a run that cannot start immediately.The number of jobs a workspace can create in an hour is limited to 10000 (includes “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.
A workspace can contain up to 12000 saved jobs.
A job can contain up to 100 tasks.
Endpoint |
HTTP Method |
---|---|
|
|
Submit a one-time run. This endpoint allows you to submit a workload directly without creating a job. Use the jobs/runs/get
API to check the run state after the job is submitted.
Example
Request
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/submit \
--data @submit-job.json \
| jq .
submit-job.json
:
{
"run_name": "my spark task",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"node_type_id": "n1-highmem-4",
"aws_attributes": {
"availability": "ON_DEMAND"
},
"num_workers": 10
},
"libraries": [
{
"jar": "dbfs:/my-jar.jar"
},
{
"maven": {
"coordinates": "org.jsoup:jsoup:1.7.2"
}
}
],
"spark_jar_task": {
"main_class_name": "com.databricks.ComputeModels"
}
}
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.The contents of
submit-job.json
with fields that are appropriate for your solution.
Request structure
Important
When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
Field Name |
Type |
Description |
---|---|---|
|
|
If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new_cluster, a description of a cluster that will be created for each run. If specifying a PipelineTask, then this field can be empty. |
|
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask |
If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task. If spark_jar_task, indicates that this job should run a JAR. If spark_python_task, indicates that this job should run a Python file. If spark_submit_task, indicates that this job should be launched by the spark submit script. If pipeline_task, indicates that this job should run a Delta Live Tables pipeline. If run_job_task, indicates that this job should run another job. |
|
|
An optional name for the run. The default value is |
|
An optional set of system destinations to notify when runs of this job begin, complete, or fail. |
|
|
Optional notification settings that are used when sending notifications
to each of the |
|
|
An array of Library |
An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
|
|
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
|
|
An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For example, |
Runs list
Endpoint |
HTTP Method |
---|---|
|
|
List runs in descending order by start time.
Note
Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Jobs API, see Runs export.
Example
Request
curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/list?job_id=<job-id>&active_only=<true-false>&offset=<offset>&limit=<limit>&run_type=<run-type>' \
| jq .
Or:
curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/list \
--data 'job_id=<job-id>&active_only=<true-false>&offset=<offset>&limit=<limit>&run_type=<run-type>' \
| jq .
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<job-id>
with the ID of the job, for example123
.“
<true-false>
withtrue
orfalse
”.<offset>
with theoffset
value.<limit>
with thelimit
value.<run-type>
with therun_type
value.
Response
{
"runs": [
{
"job_id": 1,
"run_id": 452,
"number_in_job": 5,
"state": {
"life_cycle_state": "RUNNING",
"state_message": "Performing action"
},
"task": {
"notebook_task": {
"notebook_path": "/Users/donald@duck.com/my-notebook"
}
},
"cluster_spec": {
"existing_cluster_id": "1201-my-cluster"
},
"cluster_instance": {
"cluster_id": "1201-my-cluster",
"spark_context_id": "1102398-spark-context-id"
},
"overriding_parameters": {
"jar_params": ["param1", "param2"]
},
"start_time": 1457570074236,
"end_time": 1457570075149,
"setup_duration": 259754,
"execution_duration": 3589020,
"cleanup_duration": 31038,
"run_duration": 3879812,
"trigger": "PERIODIC"
}
],
"has_more": true
}
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
If active_only is If completed_only is |
|
|
The job for which to list runs. If omitted, the Jobs service will list runs from all jobs. |
|
|
The offset of the first run to return, relative to the most recent run. |
|
|
The number of runs to return. This value should be greater than 0 and less than 1000. The default value is 20. If a request specifies a limit of 0, the service will instead use the maximum limit. |
|
|
The type of runs to return. For a description of run types, see Run. |
Response structure
Field Name |
Type |
Description |
---|---|---|
|
An array of Run |
A list of runs, from most recently started to least. |
|
|
If true, additional runs matching the provided filter are available for listing. |
Runs get
Endpoint |
HTTP Method |
---|---|
|
|
Retrieve the metadata of a run.
Note
Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Jobs API, see Runs export.
Example
Request
curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/get?run_id=<run-id>' \
| jq .
Or:
curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/get \
--data run_id=<run-id> \
| jq .
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<run-id>
with the ID of the run, for example123
.
Response
{
"job_id": 1,
"run_id": 452,
"number_in_job": 5,
"state": {
"life_cycle_state": "RUNNING",
"state_message": "Performing action"
},
"task": {
"notebook_task": {
"notebook_path": "/Users/someone@example.com/my-notebook"
}
},
"cluster_spec": {
"existing_cluster_id": "1201-my-cluster"
},
"cluster_instance": {
"cluster_id": "1201-my-cluster",
"spark_context_id": "1102398-spark-context-id"
},
"overriding_parameters": {
"jar_params": ["param1", "param2"]
},
"start_time": 1457570074236,
"end_time": 1457570075149,
"setup_duration": 259754,
"execution_duration": 3589020,
"cleanup_duration": 31038,
"run_duration": 3879812,
"trigger": "PERIODIC"
}
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier of the run for which to retrieve the metadata. This field is required. |
Response structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier of the job that contains this run. |
|
|
The canonical identifier of the run. This ID is unique across all runs of all jobs. |
|
|
The sequence number of this run among all runs of the job. This value starts at 1. |
|
|
If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id. |
|
The result and lifecycle states of the run. |
|
|
The cron schedule that triggered this run if it was triggered by the periodic scheduler. |
|
|
The task performed by the run, if any. |
|
|
A snapshot of the job’s cluster specification when this run was created. |
|
|
The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run. |
|
|
The parameters used for this run. |
|
|
|
The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |
|
|
The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field will be set to 0 if the job is still running. |
|
|
The time in milliseconds it took to set up the cluster. For runs that run on new clusters
this is the cluster creation time, for runs that run on existing clusters this time should be
very short. The total duration of the run is the sum of the |
|
|
The time in milliseconds it took to execute the commands in the JAR or notebook until they
completed, failed, timed out, were cancelled, or encountered an unexpected error. The total
duration of the run is the sum of the |
|
|
The time in milliseconds it took to terminate the cluster and clean up any associated artifacts.
The total duration of the run is the sum of the |
|
|
The time in milliseconds it took the job run and all of its repairs to finish. This field is only
set for multitask job runs and not task runs. The duration of a task run is the sum of the
|
|
The type of trigger that fired this run. |
|
|
|
The creator user name. This field won’t be included in the response if the user has been deleted |
|
|
The URL to the detail page of the run. |
Runs export
Endpoint |
HTTP Method |
---|---|
|
|
Export and retrieve the job run task.
Note
Only notebook runs can be exported in HTML format. Exporting runs of other types will fail.
Example
Request
curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/export?run_id=<run-id>' \
| jq .
Or:
curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/export \
--data run_id=<run-id> \
| jq .
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<run-id>
with the ID of the run, for example123
.
Response
{
"views": [ {
"content": "<!DOCTYPE html><html><head>Head</head><body>Body</body></html>",
"name": "my-notebook",
"type": "NOTEBOOK"
} ]
}
To extract the HTML notebook from the JSON response, download and run this Python script.
Note
The notebook body in the __DATABRICKS_NOTEBOOK_MODEL
object is encoded.
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier for the run. This field is required. |
|
Which views to export (CODE, DASHBOARDS, or ALL). Defaults to CODE. |
Response structure
Field Name |
Type |
Description |
---|---|---|
|
An array of ViewItem |
The exported content in HTML format (one for every view item). |
Runs cancel
Endpoint |
HTTP Method |
---|---|
|
|
Cancel a job run. Because the run is canceled asynchronously, the run may still be running when this request completes. The run will be terminated shortly. If the run is already in a terminal life_cycle_state
, this method is a no-op.
This endpoint validates that the run_id
parameter is valid and for invalid parameters returns HTTP status code 400.
Example
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/cancel \
--data '{ "run_id": <run-id> }'
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<run-id>
with the ID of the run, for example123
.
This example uses a .netrc file.
Runs cancel all
Endpoint |
HTTP Method |
---|---|
|
|
Cancel all active runs of a job. Because the run is canceled asynchronously, it doesn’t prevent new runs from being started.
This endpoint validates that the job_id
parameter is valid and for invalid parameters returns HTTP status code 400.
Example
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/cancel-all \
--data '{ "job_id": <job-id> }'
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<job-id>
with the ID of the job, for example123
.
This example uses a .netrc file.
Runs get output
Endpoint |
HTTP Method |
---|---|
|
|
Retrieve the output and metadata of a single task run. When a notebook task returns a value through the dbutils.notebook.exit() call, you can use this endpoint to retrieve that value. Databricks restricts this API to return the first 5 MB of the output. For returning a larger result, you can store job results in a cloud storage service.
This endpoint validates that the run_id
parameter is valid and for invalid parameters returns HTTP status code 400.
Runs are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see Export job run results. To export using the Jobs API, see Runs export.
Example
Request
curl --netrc --request GET \
'https://<databricks-instance>/api/2.0/jobs/runs/get-output?run_id=<run-id>' \
| jq .
Or:
curl --netrc --get \
https://<databricks-instance>/api/2.0/jobs/runs/get-output \
--data run_id=<run-id> \
| jq .
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<run-id>
with the ID of the run, for example123
.
Response
{
"metadata": {
"job_id": 1,
"run_id": 452,
"number_in_job": 5,
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": ""
},
"task": {
"notebook_task": {
"notebook_path": "/Users/someone@example.com/my-notebook"
}
},
"cluster_spec": {
"existing_cluster_id": "1201-my-cluster"
},
"cluster_instance": {
"cluster_id": "1201-my-cluster",
"spark_context_id": "1102398-spark-context-id"
},
"overriding_parameters": {
"jar_params": ["param1", "param2"]
},
"start_time": 1457570074236,
"setup_duration": 259754,
"execution_duration": 3589020,
"cleanup_duration": 31038,
"run_duration": 3879812,
"trigger": "PERIODIC"
},
"notebook_output": {
"result": "the maybe truncated string passed to dbutils.notebook.exit()"
}
}
Request structure
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier for the run. For a job with mulitple tasks,
this is the |
Response structure
Field Name |
Type |
Description |
---|---|---|
|
NotebookOutput OR |
If notebook_output, the output of a notebook task, if available. A notebook task that
terminates (either successfully or with a failure) without calling
If error, an error message indicating why output is not available. The message is unstructured, and its exact format is subject to change. |
|
All details of the run except for its output. |
Runs delete
Endpoint |
HTTP Method |
---|---|
|
|
Delete a non-active run. Returns an error if the run is active.
Example
curl --netrc --request POST \
https://<databricks-instance>/api/2.0/jobs/runs/delete \
--data '{ "run_id": <run-id> }'
Replace:
<databricks-instance>
with the Databricks workspace instance name, for example1234567890123456.7.gcp.databricks.com
.<run-id>
with the ID of the run, for example123
.
This example uses a .netrc file.
Data structures
In this section:
AutoScale
Range defining the min and max number of cluster workers.
Field Name |
Type |
Description |
---|---|---|
|
|
The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation. |
|
|
The maximum number of workers to which the cluster can scale up when overloaded. max_workers must be strictly greater than min_workers. |
ClusterInstance
Identifiers for the cluster and Spark context used by a run. These two values together identify an execution context across all time.
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier for the cluster used by a run. This field is always available for
runs on existing clusters. For runs on new clusters, it becomes available once
the cluster is created. This value can be used to view logs by
browsing to The response won’t include this field if the identifier is not available yet. |
|
|
The canonical identifier for the Spark context used by a run. This field will be filled in
once the run begins execution. This value can be used to view the Spark UI by
browsing to The response won’t include this field if the identifier is not available yet. |
ClusterLogConf
Path to cluster log.
Field Name |
Type |
Description |
---|---|---|
DBFS location of cluster log. Destination must be provided. For example,
|
ClusterSpec
Important
When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
Field Name |
Type |
Description |
---|---|---|
|
|
If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new_cluster, a description of a cluster that will be created for each run. If specifying a PipelineTask, then this field can be empty. |
|
An array of Library |
An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
ClusterTag
Cluster tag definition.
CronSchedule
Field Name |
Type |
Description |
---|---|---|
|
|
A Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. This field is required. |
|
|
A Java timezone ID. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone for details. This field is required. |
|
|
Indicate whether this schedule is paused or not. Either “PAUSED” or “UNPAUSED”. |
DbfsStorageInfo
DBFS storage information.
Field Name |
Type |
Description |
---|---|---|
|
|
DBFS destination. Example: |
GCSStorageInfo
Google Cloud Storage (GCS) storage information.
Field Name |
Type |
Description |
---|---|---|
|
|
File destination. Example: |
Google Cloud attributes
Attributes set during cluster creation related to Google Cloud.
Field Name |
Type |
Description |
---|---|---|
|
|
Use preemptible executors. |
|
|
Google service account email address that the cluster uses to authenticate with Google Identity. This field is used for authentication with the GCS and BigQuery data sources. |
|
|
Size, in GB, of the disk allocated to each instance. This value must be between 100 - 4096. |
Important
For use with GCS and BigQuery, your Google service account that you use to access the data source must be in the same project as the SA that you specified when setting up your Databricks account.
InitScriptInfo
Path to an init script.
Field Name |
Type |
Description |
---|---|---|
OR
|
DbfsStorageInfo (deprecated) |
Workspace location of init script. Destination must be provided. For example,
(Deprecated) DBFS location of init script. Destination must be provided. For example,
Google Cloud Storage (GCS) location of init script. Destination must be provided. For
example, |
Job
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier for this job. |
|
|
The creator user name. This field won’t be included in the response if the user has already been deleted. |
|
|
The user name that the job will run as. |
|
Settings for this job and all of its runs. These settings can be updated using the |
|
|
|
The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). |
JobEmailNotifications
Important
The on_start, on_success, and on_failure fields accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.
Field Name |
Type |
Description |
---|---|---|
|
An array of |
A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |
|
An array of |
A list of email addresses to be notified when a run successfully completes. A run is
considered to have completed successfully if it ends with a |
|
An array of |
A list of email addresses to be notified when a run unsuccessfully completes. A run is
considered to have completed unsuccessfully if it ends with an |
|
An array of |
An list of email addresses to be notified when the duration of a run exceeds the threshold specified for
the |
|
|
If true, do not send email to recipients specified in |
Field Name |
Type |
Description |
---|---|---|
|
An array of Webhook |
An optional list of system destinations to be notified when a run begins. If not specified on job
creation, reset, or update, the list is empty, and notifications are not sent.
A maximum of 3 destinations can be specified for the |
|
An array of Webhook |
An optional list of system destinations to be notified when a run completes successfully. A run is
considered to have completed successfully if it ends with a |
|
An array of Webhook |
An optional list of system destinations to be notified when a run completes unsuccessfully. A run is
considered to have completed unsuccessfully if it ends with an |
|
An array of Webhook |
An optional list of system destinations to be notified when the duration of a run exceeds the threshold
specified for the |
JobNotificationSettings
Field Name |
Type |
Description |
---|---|---|
|
|
If true, do not send notifications to recipients specified in |
|
|
If true, do not send notifications to recipients specified in |
|
|
If true, do not send notifications to recipients specified in |
JobSettings
Important
When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.
When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing.
Settings for a job. These settings can be updated using the resetJob
method.
Field Name |
Type |
Description |
---|---|---|
|
|
If existing_cluster_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new_cluster, a description of a cluster that will be created for each run. If specifying a PipelineTask, then this field can be empty. |
|
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask |
If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task. If spark_jar_task, indicates that this job should run a JAR. If spark_python_task, indicates that this job should run a Python file. If spark_submit_task, indicates that this job should be launched by the spark submit script. If pipeline_task, indicates that this job should run a Delta Live Tables pipeline. If run_job_task, indicates that this job should run another job. |
|
|
An optional name for the job. The default value is |
|
An array of Library |
An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |
|
An optional set of email addresses that will be notified when runs of this job begin or complete as well as when this job is deleted. The default behavior is to not send any emails. |
|
|
An optional set of system destinations to notify when runs of this job begin, complete, or fail. |
|
|
Optional notification settings that are used when sending notifications
to each of the |
|
|
|
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
|
|
An optional maximum number of times to retry an unsuccessful run. A run
is considered to be
unsuccessful if it completes with the |
|
|
An optional minimal interval in milliseconds between attempts. The default behavior is that unsuccessful runs are immediately retried. |
|
|
An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |
|
An optional periodic schedule for this job. The default behavior is that
the job will
only run when triggered by clicking “Run Now” in the Jobs UI or sending
an API request to
|
|
|
|
An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job’s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won’t kill any of the active runs. However, from then on, new runs will be skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
|
An optional set of health rules defined for the job. |
JobTask
Field Name |
Type |
Description |
---|---|---|
|
NotebookTask OR SparkJarTask OR SparkPythonTask OR SparkSubmitTask OR PipelineTask OR RunJobTask |
If notebook_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark_jar_task. If spark_jar_task, indicates that this job should run a JAR. If spark_python_task, indicates that this job should run a Python file. If spark_submit_task, indicates that this job should be launched by the spark submit script. If pipeline_task, indicates that this job should run a Delta Live Tables pipeline. If run_job_task, indicates that this job should run another job. |
JobsHealthRule
Field Name |
Type |
Description |
---|---|---|
|
|
Specifies the health metric that is being evaluated for a particular
health rule. Valid values are |
|
|
Specifies the operator used to compare the health metric value with the
specified threshold. Valid values are |
|
|
Specifies the threshold value that the health metric should meet to comply with the health rule. |
JobsHealthRules
Field Name |
Type |
Description |
---|---|---|
|
An array of JobsHealthRule |
An optional set of health rules that can be defined for a job. |
Library
Field Name |
Type |
Description |
---|---|---|
|
|
If jar, URI of the JAR to be installed.
DBFS and GCS ( If egg, URI of the egg to be installed.
DBFS and GCS URIs are supported.
For example: If whl, URI of the If pypi, specification of a PyPI library to be
installed. Specifying the If maven, specification of a Maven library to be
installed. For example:
If cran, specification of a CRAN library to be installed. |
MavenLibrary
Field Name |
Type |
Description |
---|---|---|
|
|
Gradle-style Maven coordinates. For example: |
|
|
Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched. |
|
An array of |
List of dependences to exclude. For example: Maven dependency exclusions: https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html. |
NewCluster
Field Name |
Type |
Description |
---|---|---|
|
|
If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes. When reading the properties of a cluster, this field reflects the desired number
of workers rather than the actual current number of workers. For example, if a cluster
is resized from 5 to 10 workers, this field will immediately be updated to reflect
the target size of 10 workers, whereas the workers listed in If autoscale, the required parameters to automatically scale clusters up and down based on load. |
|
|
The Spark version of the cluster. A list of available Spark versions can be retrieved by using the GET 2.0/clusters/spark-versions call. This field is required. |
|
An object containing a set of optional, user-specified Spark configuration key-value pairs.
You can also pass in a string of extra JVM options to the driver and the executors via
Example Spark confs:
|
|
|
Attributes related to clusters running on Google Cloud. If not specified at cluster creation, a set of default values will be used. |
|
|
|
This field encodes, through a single value, the resources available to each of the Spark nodes
in this cluster.
For example, the Spark nodes can be provisioned and optimized for
memory or compute intensive workloads
A list of available node types can be retrieved by using the
GET 2.0/clusters/list-node-types call.
This field, the |
|
|
The node type of the Spark driver.
This field is optional; if unset, the driver node type will be set as the same value
as |
|
An array of |
Set to empty array. Cluster SSH is not supported. |
|
Always set to empty array. |
|
|
The configuration for delivering Spark logs to a long-term storage destination.
Only one destination can be specified for one cluster.
If the conf is given, the logs will be delivered to the destination every |
|
|
An array of InitScriptInfo |
The configuration for storing init scripts. Any number of scripts can be specified.
The scripts are executed sequentially in the order provided.
If |
|
An object containing a set of optional, user-specified environment variable key-value pairs.
Key-value pair of the form (X,Y) are exported as is (i.e.,
To specify an additional set of Example Spark environment variables:
|
|
|
|
Akways set to false. |
|
|
The optional ID of the instance pool to use for cluster nodes. Refer to the Instance Pools API for details. |
NotebookOutput
Field Name |
Type |
Description |
---|---|---|
|
|
The value passed to dbutils.notebook.exit(). Databricks restricts this API
to return the first 1 MB of the value. For a larger result, your job can store the results
in a cloud storage service. This field will be absent if |
|
|
Whether or not the result was truncated. |
NotebookTask
All the output cells are subject to the size of 8MB. If the output of a cell has a larger size, the rest of the run will be cancelled and the run will be marked as failed. In that case, some of the content output from other cells may also be missing.
Field Name |
Type |
Description |
---|---|---|
|
|
The absolute path of the notebook to be run in the Databricks workspace. This path must begin with a slash. This field is required. |
|
|
The timestamp of the revision of the notebook. |
|
A map of ParamPair |
Base parameters to be used for each run of this job. If the run is initiated by a call
to Use What is a dynamic value reference? to set parameters containing information about job runs. If the notebook takes a parameter that is not specified in the job’s Retrieve these parameters in a notebook using dbutils.widgets.get. |
ParamPair
Name-based parameters for jobs running notebook tasks.
Important
The fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.
Type |
Description |
---|---|
|
Parameter name. Pass to dbutils.widgets.get to retrieve the value. |
|
Parameter value. |
PipelineTask
Field Name |
Type |
Description |
---|---|---|
|
|
The full name of the Delta Live Tables pipeline task to execute. |
PythonPyPiLibrary
Field Name |
Type |
Description |
---|---|---|
|
|
The name of the PyPI package to install. An optional exact version specification is also
supported. Examples: |
|
|
The repository where the package can be found. If not specified, the default pip index is used. |
RCranLibrary
Field Name |
Type |
Description |
---|---|---|
|
|
The name of the CRAN package to install. This field is required. |
|
|
The repository where the package can be found. If not specified, the default CRAN repo is used. |
Run
All the information about a run except for its output. The output can be retrieved separately
with the getRunOutput
method.
Field Name |
Type |
Description |
---|---|---|
|
|
The canonical identifier of the job that contains this run. |
|
|
The canonical identifier of the run. This ID is unique across all runs of all jobs. |
|
|
The creator user name. This field won’t be included in the response if the user has already been deleted. |
|
|
The sequence number of this run among all runs of the job. This value starts at 1. |
|
|
If this run is a retry of a prior run attempt, this field contains the run_id of the original attempt; otherwise, it is the same as the run_id. |
|
The result and lifecycle states of the run. |
|
|
The cron schedule that triggered this run if it was triggered by the periodic scheduler. |
|
|
The task performed by the run, if any. |
|
|
A snapshot of the job’s cluster specification when this run was created. |
|
|
The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run. |
|
|
The parameters used for this run. |
|
|
|
The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |
|
|
The time it took to set up the cluster in milliseconds. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. |
|
|
The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. |
|
|
The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The total duration of the run is the sum of the setup_duration, the execution_duration, and the cleanup_duration. |
|
|
The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field will be set to 0 if the job is still running. |
|
The type of trigger that fired this run. |
|
|
|
An optional name for the run. The default value is |
|
|
The URL to the detail page of the run. |
|
|
The type of the run.
|
|
|
The sequence number of this run attempt for a triggered job run. The initial attempt of a run
has an attempt_number of 0. If the initial run attempt fails, and the job has a retry policy
( |
RunJobTask
Field Name |
Type |
Description |
---|---|---|
|
|
Unique identifier of the job to run. This field is required. |
RunLifeCycleState
The life cycle state of a run. Allowed state transitions are:
QUEUED
->PENDING
PENDING
->RUNNING
->TERMINATING
->TERMINATED
PENDING
->SKIPPED
PENDING
->INTERNAL_ERROR
RUNNING
->INTERNAL_ERROR
TERMINATING
->INTERNAL_ERROR
State |
Description |
---|---|
|
The run has been triggered but is queued because it reached one of the following limits:
The job or the run must have queuing enabled before it can reach this state. |
|
The run has been triggered. If the configured maximum concurrent runs of the job is already reached,
the run will immediately transition into the |
|
The task of this run is being executed. |
|
The task of this run has completed, and the cluster and execution context are being cleaned up. |
|
The task of this run has completed, and the cluster and execution context have been cleaned up. This state is terminal. |
|
This run was aborted because a previous run of the same job was already active. This state is terminal. |
|
An exceptional state that indicates a failure in the Jobs service, such as
network failure over a long period. If a run on a new cluster ends in the |
RunParameters
Parameters for this run. Only one of jar_params, python_params
, or notebook_params
should be specified in the run-now
request, depending on the type of job task.
Jobs with Spark JAR task or Python task take a list of position-based parameters, and jobs
with notebook tasks take a key value map.
Field Name |
Type |
Description |
---|---|---|
|
An array of |
A list of parameters for jobs with Spark JAR tasks, e.g. Use What is a dynamic value reference? to set parameters containing information about job runs. |
|
A map of ParamPair |
A map from keys to values for jobs with notebook task, e.g.
If not specified upon notebook_params cannot be specified in conjunction with jar_params. Use What is a dynamic value reference? to set parameters containing information about job runs. The JSON representation of this field (i.e.
|
|
An array of |
A list of parameters for jobs with Python tasks, e.g. Use What is a dynamic value reference? to set parameters containing information about job runs. Important These parameters accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. |
|
An array of |
A list of parameters for jobs with spark submit task, e.g.
Use What is a dynamic value reference? to set parameters containing information about job runs. Important These parameters accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. |
RunResultState
The result state of the run.
If
life_cycle_state
=TERMINATED
: if the run had a task, the result is guaranteed to be available, and it indicates the result of the task.If
life_cycle_state
=PENDING
,RUNNING
, orSKIPPED
, the result state is not available.If
life_cycle_state
=TERMINATING
or lifecyclestate =INTERNAL_ERROR
: the result state is available if the run had a task and managed to start it.
Once available, the result state never changes.
State |
Description |
---|---|
|
The task completed successfully. |
|
The task completed with an error. |
|
The run was stopped after reaching the timeout. |
|
The run was canceled at user request. |
RunState
Field Name |
Type |
Description |
---|---|---|
|
A description of a run’s current location in the run lifecycle. This field is always available in the response. |
|
|
The result state of a run. If it is not available, the response won’t include this field. See RunResultState for details about the availability of result_state. |
|
|
|
Whether a run was canceled manually by a user or by the scheduler because the run timed out. |
|
|
A descriptive message for the current state. This field is unstructured, and its exact format is subject to change. |
SparkConfPair
Spark configuration key-value pairs.
Type |
Description |
---|---|
|
A configuration property name. |
|
The configuration property value. |
SparkEnvPair
Spark environment variable key-value pairs.
Important
When specifying environment variables in a job cluster, the fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.
Type |
Description |
---|---|
|
An environment variable name. |
|
The environment variable value. |
SparkJarTask
Field Name |
Type |
Description |
---|---|---|
|
|
Deprecated since 04/2016. Provide a |
|
|
The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use |
|
An array of |
Parameters passed to the main method. Use What is a dynamic value reference? to set parameters containing information about job runs. |
SparkPythonTask
Field Name |
Type |
Description |
---|---|---|
|
|
The URI of the Python file to be executed. DBFS paths are supported. This field is required. |
|
An array of |
Command line parameters passed to the Python file. Use What is a dynamic value reference? to set parameters containing information about job runs. |
SparkSubmitTask
Important
You can invoke Spark submit tasks only on new clusters.
In the new_cluster specification,
libraries
andspark_conf
are not supported. Instead, use--jars
and--py-files
to add Java and Python libraries and--conf
to set the Spark configuration.master
,deploy-mode
, andexecutor-cores
are automatically configured by Databricks; you cannot specify them in parameters.By default, the Spark submit job uses all available memory (excluding reserved memory for Databricks services). You can set
--driver-memory
, and--executor-memory
to a smaller value to leave some room for off-heap usage.The
--jars
,--py-files
,--files
arguments support DBFS paths.
For example, assuming the JAR is uploaded to DBFS, you can run SparkPi
by setting the following parameters.
{
"parameters": [
"--class",
"org.apache.spark.examples.SparkPi",
"dbfs:/path/to/examples.jar",
"10"
]
}
Field Name |
Type |
Description |
---|---|---|
|
An array of |
Command-line parameters passed to spark submit. Use What is a dynamic value reference? to set parameters containing information about job runs. |
TriggerType
These are the type of triggers that can fire a run.
Type |
Description |
---|---|
|
Schedules that periodically trigger runs, such as a cron scheduler. |
|
One time triggers that fire a single run. This occurs you triggered a single run on demand through the UI or the API. |
|
Indicates a run that is triggered as a retry of a previously failed run. This occurs when you request to re-run the job in case of failures. |
ViewItem
The exported content is in HTML format. For example, if the view to export is dashboards, one HTML string is returned for every dashboard.
Field Name |
Type |
Description |
---|---|---|
|
|
Content of the view. |
|
|
Name of the view item. In the case of code view, the notebook’s name. In the case of dashboard view, the dashboard’s name. |
|
Type of the view item. |
ViewType
Type |
Description |
---|---|
|
Notebook view item. |
|
Dashboard view item. |
ViewsToExport
View to export: either code, all dashboards, or all.
Type |
Description |
---|---|
|
Code view of the notebook. |
|
All dashboard views of the notebook. |
|
All views of the notebook. |
Webhook
Field Name |
Type |
Description |
---|---|---|
|
|
Identifier referencing a system notification destination. This field is required. |
WebhookNotifications
Field Name |
Type |
Description |
---|---|---|
|
An array of Webhook |
An optional list of system destinations to be notified when a run begins. If not specified on job
creation, reset, or update, the list is empty, and notifications are not sent.
A maximum of 3 destinations can be specified for the |
|
An array of Webhook |
An optional list of system destinations to be notified when a run completes successfully. A run is
considered to have completed successfully if it ends with a |
|
An array of Webhook |
An optional list of system destinations to be notified when a run completes unsuccessfully. A run is
considered to have completed unsuccessfully if it ends with an |
|
An array of Webhook |
An optional list of system destinations to be notified when the duration of a run exceeds the threshold
specified for the |
WorkspaceStorageInfo
Workspace storage information.
Field Name |
Type |
Description |
---|---|---|
|
|
File destination. Example: |