What is the Databricks CLI?

Note

This information applies to Databricks CLI versions 0.205 and above. The Databricks CLI is in Public Preview.

Databricks CLI use is subject to the Databricks License and Databricks Privacy Notice, including any Usage Data provisions.

The Databricks command-line interface (also known as the Databricks CLI) provides a tool to automate the Databricks platform from your terminal, command prompt, or automation scripts. You can also run Databricks CLI commands from within a Databricks workspace using web terminal. See Run shell commands in Databricks web terminal.

To install and configure authentication for the Databricks CLI, see Install or update the Databricks CLI and Authentication for the Databricks CLI.

Information for legacy Databricks CLI users

  • Databricks plans no support or new feature work for the legacy Databricks CLI.

  • For more information about the legacy Databricks CLI, see Databricks CLI (legacy).

  • To migrate from Databricks CLI version 0.18 or below to Databricks CLI version 0.205 or above, see Databricks CLI migration.

How does the Databricks CLI work?

The CLI wraps the Databricks REST API, which provides endpoints for modifying or requesting information about Databricks account and workspace objects. See the Databricks REST API reference.

For example, to print information about an individual cluster in a workspace, you run the CLI as follows:

databricks clusters get 1234-567890-a12bcde3

With curl, the equivalent operation is as follows:

curl --request GET "https://${DATABRICKS_HOST}/api/2.0/clusters/get" \
     --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
     --data '{ "cluster_id": "1234-567890-a12bcde3" }'

Example: create a Databricks job

The following example uses the CLI to create a Databricks job. This job contains a single job task. This task runs the specified Databricks notebook. This notebook has a dependency on a specific version of the PyPI package named wheel. To run this task, the job temporarily creates a job cluster that exports an environment variable named PYSPARK_PYTHON. After the job runs, the cluster is terminated.

databricks jobs create --json '{
  "name": "My hello notebook job",
  "tasks": [
    {
      "task_key": "my_hello_notebook_task",
      "notebook_task": {
        "notebook_path": "/Workspace/Users/someone@example.com/hello",
        "source": "WORKSPACE"
      },
      "libraries": [
        {
          "pypi": {
            "package": "wheel==0.41.2"
          }
        }
      ],
      "new_cluster": {
        "spark_version": "13.3.x-scala2.12",
        "node_type_id": "n2-highmem-4",
        "num_workers": 1,
        "spark_env_vars": {
          "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
        }
      }
    }
  ]
}'

Next steps