Databricks CLI

Note

The CLI feature is unavailable on Databricks on Google Cloud as of this release.

The Databricks command-line interface (CLI) provides an easy-to-use interface to the Databricks platform. The open source project is hosted on GitHub. The CLI is built on top of the Databricks REST API 2.0 and is organized into command groups based on the Cluster Policies APIs 2.0, Clusters API 2.0, _, Groups API 2.0, Instance Pools API 2.0, Jobs API 2.1, Libraries API 2.0, Repos API 2.0, Secrets API 2.0, Token API 2.0, and Workspace API 2.0 through the cluster-policies, clusters, fs, groups, instance-pools, jobs and runs, libraries, repos, secrets, tokens, and workspace command groups, respectively.

Experimental

This CLI is under active development and is released as an Experimental client. This means that interfaces are still subject to change.

Set up the CLI

This section lists CLI requirements and describes how to install and configure your environment to run the CLI.

Requirements

  • Python 3 - 3.6 and above

  • Python 2 - 2.7.9 and above

    Important

    On macOS, the default Python 2 installation does not implement the TLSv1_2 protocol and running the CLI with this Python installation results in the error: AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'. Use Homebrew to install a version of Python that has ssl.PROTOCOL_TLSv1_2.

Install the CLI

Run pip install databricks-cli using the appropriate version of pip for your Python installation.

Update the CLI

Run pip install databricks-cli --upgrade using the appropriate version of pip for your Python installation.

To list the version of the CLI that is currently installed, run databricks --version (or databricks -v).

Set up authentication

Before you can run CLI commands, you must set up authentication. To authenticate to the CLI you use a Databricks personal access token. (A Databricks username and password are also supported but not recommended.)

To configure the CLI to use a personal access token, run databricks configure --token. The command begins by issuing the prompt:

Databricks Host (should begin with https://):

Enter your workspace URL, with the format https://<instance-name>.gcp.databricks.com. To get your workspace URL, see Workspace instance names, URLs, and IDs.

The command continues by issuing the prompt to enter your personal access token:

Token:

After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg on Unix, Linux, or macOS, or %USERPROFILE%\.databrickscfg on Windows. The file contains a default profile entry:

[DEFAULT]
host = <workspace-URL>
token = <personal-access-token>

For CLI 0.8.1 and above, you can change the path of this file by setting the environment variable DATABRICKS_CONFIG_FILE.

export DATABRICKS_CONFIG_FILE=<path-to-file>
setx DATABRICKS_CONFIG_FILE "<path-to-file>" /M

Important

Because the CLI is built on top of the REST API, your authentication configuration in your .netrc file takes precedence over your configuration in .databrickscfg.

Connection profiles

The Databricks CLI configuration supports multiple connection profiles. The same installation of Databricks CLI can be used to make API calls on multiple Databricks workspaces.

To add a connection profile, specify a unique name for the profile:

databricks configure --token --profile <profile-name>

The .databrickscfg file contains a corresonding profile entry:

[<profile-name>]
host = <workspace-URL>
token = <token>

To use the connection profile:

databricks <group> <command> --profile <profile-name>

If --profile <profile-name> is not specified, the default profile is used. If a default profile is not found, you are prompted to configure the CLI with a default profile.

Alias command groups

Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group, for example databricks workspace ls. To make the CLI easier to use, you can alias command groups to shorter commands. For example, to shorten databricks workspace ls to dw ls in the Bourne again shell, you can add alias dw="databricks workspace" to the appropriate bash profile. Typically, this file is located at ~/.bash_profile.

Tip

Databricks already aliases databricks fs to dbfs; databricks fs ls and dbfs ls are equivalent.

Use the CLI

This section shows you how to get CLI help, parse CLI output, and invoke commands in each command group.

Display CLI command group help

You list the subcommands for any command group by running databricks <group> --help (or databricks <group> -h). For example, you list the DBFS CLI subcommands by running databricks fs -h.

Display CLI subcommand help

You list the help for a subcommand by running databricks <group> <subcommand> --help (or databricks <group> <subcommand> -h). For example, you list the help for the DBFS copy files subcommand by running databricks fs cp -h.

Use jq to parse CLI output

Some Databricks CLI commands output the JSON response from the API endpoint. Sometimes it can be useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job definition, you must take the settings field of a databricks jobs get command and use that as an argument to the databricks jobs create command. In these cases, we recommend you to use the utility jq.

For example, the following command prints the settings of the job with the ID of 233.

databricks jobs list --output JSON | jq '.jobs[] | select(.job_id == 233) | .settings'
{
  "name": "Quickstart",
  "new_cluster": {
    "spark_version": "7.5.x-scala2.12",
    "spark_env_vars": {
      "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "num_workers": 8,
    ...
  },
  "email_notifications": {},
  "timeout_seconds": 0,
  "notebook_task": {
    "notebook_path": "/Quickstart"
  },
  "max_concurrent_runs": 1
}

As another example, the following command prints the names and IDs of all available clusters in the workspace:

databricks clusters list --output JSON | jq '[ .clusters[] | { name: .cluster_name, id: .cluster_id } ]'
[
  {
    "name": "My Cluster 1",
    "id": "1234-567890-grip123"
  },
  {
    "name": "My Cluster 2",
    "id": "2345-678901-patch234"
  }
]

You can install jq for example on macOS using Homebrew with brew install jq or on Windows using Chocolatey with choco install jq. For more information on jq, see the jq Manual.

JSON string parameters

String parameters are handled differently depending on your operating system:

You must enclose JSON string parameters in single quotes. For example:

databricks jobs run-now --job-id 9 --jar-params '["20180505", "alantest"]'

You must enclose JSON string parameters in double quotes, and the quote characters inside the string must be preceded by \. For example:

databricks jobs run-now --job-id 9 --jar-params "[\"20180505\", \"alantest\"]"

Troubleshooting

The following sections provide tips for troubleshooting common issues with the Databricks CLI.

Using EOF with databricks configure does not work

For Databricks CLI 0.12.0 and above, using the end of file (EOF) sequence in a script to pass parameters to the databricks configure command does not work. For example, the following script causes Databricks CLI to ignore the parameters, and no error message is thrown:

# Do not do this.
databricksUrl=<workspace-url>
databricksToken=<personal-access-token>

databricks configure --token << EOF
$databricksUrl
$databricksToken
EOF

To fix this issue, do one of the following:

  • Use one of the other programmatic configuration options as described in Set up authentication.
  • Manually add the host and token values to the .databrickscfg file as described in Set up authentication.
  • Downgrade your installation of the Databricks CLI to 0.11.0 or below, and run your script again.