Automate job creation and management

This article shows you how to get started with developer tools to automate the creation and management of jobs. It introduces you to the Databricks CLI, the Databricks SDKs, and the REST API.

Note

This article provides examples for creating and managing jobs using the Databricks CLI, the Databricks Python SDK, and the REST API as an easy introduction to those tools. To programmatically manage jobs as part of CI/CD use Databricks Asset Bundles (DABs) or the Databricks Terraform provider.

Compare tools

The following table compares the Databricks CLI, the Databricks SDKs, and the REST API for programmatically creating and managing jobs. To learn about all available developer tools, see Developer tools.

Tool

Description

Databricks CLI

Access Databricks functionality using the Databricks command-line interface (CLI), which wraps the REST API. Use the CLI for one-off tasks such as experimentation, shell scripting, and invoking the REST API directly

Databricks SDKs

Develop applications and create custom Databricks workflows using a Databricks SDK, available for Python, Java, Go, or R. Instead of sending REST API calls directly using curl or Postman, you can use an SDK to interact with Databricks.

Databricks REST API

If none of the above options work for your specific use case, you can use the Databricks REST API directly. Use the REST API directly for use cases such as automating processes where an SDK in your preferred programming language is not currently available.

Get started with the Databricks CLI

To install and configure authentication for the Databricks CLI, see Install or update the Databricks CLI and Authentication for the Databricks CLI.

The Databricks CLI has command groups for Databricks features, including one for jobs, that contain a set of related commands, which can also contain subcommands. The jobs command group enables you to manage your jobs and job runs with actions such as create, delete and get. Because the CLI wraps the Databricks REST API, most CLI commands map to a REST API request. For example, databricks jobs get maps to GET/api/2.2/jobs/get.

To output more detailed usage and syntax information for the jobs command group, an individual command, or subcommand, use the h flag:

  • databricks jobs -h

  • databricks jobs <command-name> -h

  • databricks jobs <command-name> <subcommand-name> -h

Example: Retrieve a Databricks job using the CLI

To print information about an individual job in a workspace, run the following command:

$ databricks jobs get <job-id>

databricks jobs get 478701692316314

This command returns JSON:

{
  "created_time":1730983530082,
  "creator_user_name":"someone@example.com",
  "job_id":478701692316314,
  "run_as_user_name":"someone@example.com",
  "settings": {
    "email_notifications": {
      "no_alert_for_skipped_runs":false
    },
    "format":"MULTI_TASK",
    "max_concurrent_runs":1,
    "name":"job_name",
    "tasks": [
      {
        "email_notifications": {},
        "notebook_task": {
          "notebook_path":"/Workspace/Users/someone@example.com/directory",
          "source":"WORKSPACE"
        },
        "run_if":"ALL_SUCCESS",
        "task_key":"success",
        "timeout_seconds":0,
        "webhook_notifications": {}
      },
      {
        "depends_on": [
          {
            "task_key":"success"
          }
        ],
        "disable_auto_optimization":true,
        "email_notifications": {},
        "max_retries":3,
        "min_retry_interval_millis":300000,
        "notebook_task": {
          "notebook_path":"/Workspace/Users/someone@example.com/directory",
          "source":"WORKSPACE"
        },
        "retry_on_timeout":false,
        "run_if":"ALL_SUCCESS",
        "task_key":"fail",
        "timeout_seconds":0,
        "webhook_notifications": {}
      }
    ],
    "timeout_seconds":0,
    "webhook_notifications": {}
  }
}

Example: Create a Databricks job using the CLI

The following example uses the Databricks CLI to create a Databricks job. This job contains a single job task that runs the specified notebook. This notebook has a dependency on a specific version of the wheel PyPI package. To run this task, the job temporarily creates a cluster that exports an environment variable named PYSPARK_PYTHON. After the job runs, the cluster is terminated.

  1. Copy and paste the following JSON into a file. You can access the JSON format of any existing job by selecting the View JSON option from the job page UI.

    {
    "name": "My hello notebook job",
    "tasks": [
        {
        "task_key": "my_hello_notebook_task",
        "notebook_task": {
            "notebook_path": "/Workspace/Users/someone@example.com/hello",
            "source": "WORKSPACE"
        }
        }
    ]
    }
    
  2. Run the following command, replacing <file-path> with the path and name of the file that you just created.

    databricks jobs create --json @<file-path>
    

Get started with the Databricks SDK

Databricks provides SDKs that allow you to automate operations using popular programming languages such as Python, Java, and Go. This section shows you how to get started using the Python SDK to create and manage Databricks jobs.

You can use the Databricks SDK from your Databricks notebook or from your local development machine. If you are using your local development machine, ensure you first complete Get started with the Databricks SDK for Python.

Note

If you are developing from a Databricks notebook and are using a cluster that uses Databricks Runtime 12.2 LTS and below, you must install the Databricks SDK for Python first. See Install or upgrade the Databricks SDK for Python.

Example: Create a Databricks job using the Python SDK

The following example notebook code creates a Databricks job that runs an existing notebook. It retrieves the existing notebook’s path and related job settings with prompts.

First, make sure that the correct version of the SDK has been installed:

%pip install --upgrade databricks-sdk==0.40.0
%restart_python

Next, to create a job with a notebook task, run the following, answering the prompts:

from databricks.sdk.service.jobs import JobSettings as Job
from databricks.sdk import WorkspaceClient


job_name            = input("Provide a short name for the job, for example, my-job: ")
notebook_path       = input("Provide the workspace path of the notebook to run, for example, /Users/someone@example.com/my-notebook: ")
task_key            = input("Provide a unique key to apply to the job's tasks, for example, my-key: ")

test_sdk = Job.from_dict(
   {
       "name": job_name ,
       "tasks": [
           {
               "task_key": task_key,
               "notebook_task": {
                   "notebook_path": notebook_path,
                   "source": "WORKSPACE",
               },
           },
       ],
   }
)

w = WorkspaceClient()
j = w.jobs.create(**test_sdk.as_shallow_dict())

print(f"View the job at {w.config.host}/#job/{j.job_id}\n")

Get started with the Databricks REST API

Note

Databricks recommends using the Databricks CLI and a Databricks SDK, unless you are using a programming language that does not have a corresponding Databricks SDK.

The following example makes a request to the Databricks REST API to retrieve details for a single job. It assumes the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables have been set as described in Perform Databricks personal access token authentication.

$ curl --request GET "https://${DATABRICKS_HOST}/api/2.2/jobs/get" \
     --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
     --data '{ "job": "11223344" }'

For information on using the Databricks REST API see the Databricks REST API reference documentation.

Clean up

To delete any jobs you just created, run databricks jobs delete <job-id> from the Databricks CLI or delete the job directly from the Databricks workspace UI.

Next steps