Automate job creation and management
This article shows you how to get started with developer tools to automate the creation and management of jobs. It introduces you to the Databricks CLI, the Databricks SDKs, and the REST API.
Note
This article provides examples for creating and managing jobs using the Databricks CLI, the Databricks Python SDK, and the REST API as an easy introduction to those tools. To programmatically manage jobs as part of CI/CD use Databricks Asset Bundles (DABs) or the Databricks Terraform provider.
Compare tools
The following table compares the Databricks CLI, the Databricks SDKs, and the REST API for programmatically creating and managing jobs. To learn about all available developer tools, see Developer tools.
Tool |
Description |
---|---|
Access Databricks functionality using the Databricks command-line interface (CLI), which wraps the REST API. Use the CLI for one-off tasks such as experimentation, shell scripting, and invoking the REST API directly |
|
Develop applications and create custom Databricks workflows using a Databricks SDK, available for Python, Java, Go, or R. Instead of sending REST API calls directly using curl or Postman, you can use an SDK to interact with Databricks. |
|
If none of the above options work for your specific use case, you can use the Databricks REST API directly. Use the REST API directly for use cases such as automating processes where an SDK in your preferred programming language is not currently available. |
Get started with the Databricks CLI
To install and configure authentication for the Databricks CLI, see Install or update the Databricks CLI and Authentication for the Databricks CLI.
The Databricks CLI has command groups for Databricks features, including one for jobs, that contain a set of related commands, which can also contain subcommands. The jobs
command group enables you to manage your jobs and job runs with actions such as create
, delete
and get
. Because the CLI wraps the Databricks REST API, most CLI commands map to a REST API request. For example, databricks jobs get
maps to GET/api/2.2/jobs/get
.
To output more detailed usage and syntax information for the jobs command group, an individual command, or subcommand, use the h
flag:
databricks jobs -h
databricks jobs <command-name> -h
databricks jobs <command-name> <subcommand-name> -h
Example: Retrieve a Databricks job using the CLI
To print information about an individual job in a workspace, run the following command:
$ databricks jobs get <job-id>
databricks jobs get 478701692316314
This command returns JSON:
{
"created_time":1730983530082,
"creator_user_name":"someone@example.com",
"job_id":478701692316314,
"run_as_user_name":"someone@example.com",
"settings": {
"email_notifications": {
"no_alert_for_skipped_runs":false
},
"format":"MULTI_TASK",
"max_concurrent_runs":1,
"name":"job_name",
"tasks": [
{
"email_notifications": {},
"notebook_task": {
"notebook_path":"/Workspace/Users/someone@example.com/directory",
"source":"WORKSPACE"
},
"run_if":"ALL_SUCCESS",
"task_key":"success",
"timeout_seconds":0,
"webhook_notifications": {}
},
{
"depends_on": [
{
"task_key":"success"
}
],
"disable_auto_optimization":true,
"email_notifications": {},
"max_retries":3,
"min_retry_interval_millis":300000,
"notebook_task": {
"notebook_path":"/Workspace/Users/someone@example.com/directory",
"source":"WORKSPACE"
},
"retry_on_timeout":false,
"run_if":"ALL_SUCCESS",
"task_key":"fail",
"timeout_seconds":0,
"webhook_notifications": {}
}
],
"timeout_seconds":0,
"webhook_notifications": {}
}
}
Example: Create a Databricks job using the CLI
The following example uses the Databricks CLI to create a Databricks job. This job contains a single job task that runs the specified notebook. This notebook has a dependency on a specific version of the wheel
PyPI package. To run this task, the job temporarily creates a cluster that exports an environment variable named PYSPARK_PYTHON
. After the job runs, the cluster is terminated.
Copy and paste the following JSON into a file. You can access the JSON format of any existing job by selecting the View JSON option from the job page UI.
{ "name": "My hello notebook job", "tasks": [ { "task_key": "my_hello_notebook_task", "notebook_task": { "notebook_path": "/Workspace/Users/someone@example.com/hello", "source": "WORKSPACE" } } ] }
Run the following command, replacing
<file-path>
with the path and name of the file that you just created.databricks jobs create --json @<file-path>
Get started with the Databricks SDK
Databricks provides SDKs that allow you to automate operations using popular programming languages such as Python, Java, and Go. This section shows you how to get started using the Python SDK to create and manage Databricks jobs.
You can use the Databricks SDK from your Databricks notebook or from your local development machine. If you are using your local development machine, ensure you first complete Get started with the Databricks SDK for Python.
Note
If you are developing from a Databricks notebook and are using a cluster that uses Databricks Runtime 12.2 LTS and below, you must install the Databricks SDK for Python first. See Install or upgrade the Databricks SDK for Python.
Example: Create a Databricks job using the Python SDK
The following example notebook code creates a Databricks job that runs an existing notebook. It retrieves the existing notebook’s path and related job settings with prompts.
First, make sure that the correct version of the SDK has been installed:
%pip install --upgrade databricks-sdk==0.40.0
%restart_python
Next, to create a job with a notebook task, run the following, answering the prompts:
from databricks.sdk.service.jobs import JobSettings as Job
from databricks.sdk import WorkspaceClient
job_name = input("Provide a short name for the job, for example, my-job: ")
notebook_path = input("Provide the workspace path of the notebook to run, for example, /Users/someone@example.com/my-notebook: ")
task_key = input("Provide a unique key to apply to the job's tasks, for example, my-key: ")
test_sdk = Job.from_dict(
{
"name": job_name ,
"tasks": [
{
"task_key": task_key,
"notebook_task": {
"notebook_path": notebook_path,
"source": "WORKSPACE",
},
},
],
}
)
w = WorkspaceClient()
j = w.jobs.create(**test_sdk.as_shallow_dict())
print(f"View the job at {w.config.host}/#job/{j.job_id}\n")
Get started with the Databricks REST API
Note
Databricks recommends using the Databricks CLI and a Databricks SDK, unless you are using a programming language that does not have a corresponding Databricks SDK.
The following example makes a request to the Databricks REST API to retrieve details for a single job. It assumes the DATABRICKS_HOST
and DATABRICKS_TOKEN
environment variables have been set as described in Perform Databricks personal access token authentication.
$ curl --request GET "https://${DATABRICKS_HOST}/api/2.2/jobs/get" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data '{ "job": "11223344" }'
For information on using the Databricks REST API see the Databricks REST API reference documentation.
Clean up
To delete any jobs you just created, run databricks jobs delete <job-id>
from the Databricks CLI or delete the job directly from the Databricks workspace UI.
Next steps
To learn more about the Databricks CLI, see What is the Databricks CLI? and Databricks CLI commands to learn about other command groups.
To learn more about the Databricks SDK, see Use SDKs with Databricks.
To learn more about CI/CD using Databricks refer to Databricks Asset Bundles and Databricks Terraform provider and Terraform CDKTF for Databricks.
For a comprehensive overview of all developer tooling, see Developer tools.