Configure the Databricks ODBC and JDBC drivers

This article describes how to configure the Databricks ODBC and JDBC drivers to connect your tools or clients to Databricks. For tool or client specific connection instructions, see the Databricks integrations.

Requirements

Before you start, you need to make sure you have the appropriate permissions to connect to Databricks, to prepare your credentials and to retrieve the connection details.

Permission requirements

There are two permissions you may need when you connect to a Databricks cluster:

  • Can Attach To permission to connect to the running cluster.

  • Can Restart permission to automatically trigger the cluster to start if its state is terminated when connecting.

Authentication requirements

The Databricks ODBC and JDBC drivers support authentication by using a personal access token.

Retrieve the connection details

This section explains how to retrieve the connection details that you need to connect to Databricks.

Get connection details for a cluster

  1. Click compute icon Compute in the sidebar.

  2. Choose a cluster to connect to.

  3. Navigate to Advanced Options.

  4. Click on the JDBC/ODBC tab.

  5. Copy the connection details.

Get connection details for a SQL warehouse

  1. Click Endpoints Icon SQL Warehouses in the sidebar.

  2. Choose a warehouse to connect to.

  3. Navigate to the Connection Details tab.

  4. Copy the connection details.

ODBC driver

This section presents the steps to configure your ODBC driver to connect to Databricks.

Download the ODBC driver

Review the license agreement for the Databricks ODBC driver before installing the software.

Some tools and clients require you to install the Databricks ODBC driver to set up a connection to Databricks, while others embed the driver and do not require separate installation. For example, to use Tableau Desktop, the ODBC driver needs to be installed, while recent Power BI Desktop releases include the driver preinstalled and no action is needed.

Go to the Databricks ODBC driver download page to download the driver for your operating system.

After you download the appropriate installer of the driver for your platform, use the following instructions to install and configure the driver:

Install and configure the ODBC driver for Windows

In Windows, you can set up a Data Source Name (DSN) configuration to connect your ODBC client application to Databricks. To set up a DSN configuration, use the Windows ODBC Data Source Administrator.

  1. Download the latest driver version for Windows, if you haven’t already done so. See Download the ODBC driver.

  2. Double-click on the dowloaded .msi file to install the driver. The installation directory is C:\Program Files\Simba Spark ODBC Driver.

  3. From the Start menu, search for ODBC Data Sources to launch the ODBC Data Source Administrator.

  4. Navigate to the Drivers tab to verify that the driver (Simba Spark ODBC Driver) is installed.

  5. Go to the User DSN or System DSN tab and click the Add button.

  6. Select the Simba Spark ODBC Driver from the list of installed drivers.

  7. Choose a Data Source Name and set the mandatory ODBC configuration and connection parameters. See also ODBC driver capabilities for more driver configurations.

  8. Click OK to create the DSN.

Install and configure the ODBC driver for macOS

In macOS, you can set up a Data Source Name (DSN) configuration to connect your ODBC client application to Databricks. To set up a DSN on macOS, use the ODBC Manager.

  1. Install ODBC Manager by using Homebrew, or download the ODBC Manager and then double-click on the downloaded .dmg file to install it.

  2. Download the latest driver version for macOS, if you haven’t already done so. See Download the ODBC driver.

  3. Double-click on the dowloaded .dmg file to install the driver. The installation directory is /Library/simba/spark.

  4. Start the ODBC Manager.

  5. Navigate to the Drivers tab to verify that the driver (Simba Spark ODBC Driver) is installed.

  6. Go to the User DSN or System DSN tab and click the Add button.

  7. Select the Simba Spark ODBC Driver from the list of installed drivers.

  8. Click Add.

  9. Choose a Data Source Name and create key-value pairs to set the mandatory ODBC configuration and connection parameters. See also ODBC driver capabilities for more driver configurations.

  10. Click OK to create the DSN.

Install and configure the ODBC driver for Linux

In Linux, you can set up a Data Source Name (DSN) configuration to connect your ODBC client application to Databricks. To set up a DSN on Linux, use the unixODBC Driver Manager.

  1. Install unixODBC.

  2. Download the latest driver version for Linux, if you haven’t already done so. See Download the ODBC driver.

  3. Install the ODBC driver.

    • RPM package:

      sudo yum --nogpgcheck localinstall simbaspark_<version>.rpm
      
    • DEB package:

      sudo dpkg -i simbaspark_<version>.deb
      

    The installation directory is /opt/simba/spark.

  4. Locate the odbc.ini driver configuration file that corresponds to SYSTEM DATA SOURCES:

    odbcinst -j
    
  5. In a text editor, open the odbc.ini configuration file.

  6. Create an [ODBC Data Sources] section:

    [ODBC Data Sources]
    Databricks=Databricks ODBC Connector
    
  7. Create another section with the same name as your DSN and specify the configuration parameters as key-value pairs. See the mandatory ODBC configuration and connection parameters for more information about how to set the configuration parameters. See also ODBC driver capabilities for more driver configurations.

    [Databricks]
    Driver=<path-to-driver>
    <key>=<value>
    ...
    

ODBC configuration and connection parameters

This section presents the mandatory configuration and connection parameters for the ODBC driver.

Configuration

Description

Driver

Driver installation path

Host

Databricks server hostname

Port

443

HTTPPath

Databricks compute resources URL

SSL

1

ThriftTransport

2

Authentication parameters

To authenticate by using a personal access token, set the following configurations:

Configuration

Description

AuthMech

3

UID

token

PWD

<personal-access-token>

Define the connection string to use in your application as follows:

Driver=<path-to-driver>;Host=<server-hostname>;Port=443;HTTPPath=<http-path>;ThriftTransport=2;SSL=1;AuthMech=3;UID=token;PWD=<personal-access-token>

Set the HOST and HTTPPath configurations to the values that you retrieved in Retrieve the connection details.

DSN examples for non-Windows machines

Here are some examples that show how to set up a DSN on different platforms based on your authentication method.

When you authenticate with a personal access token, complete the following steps:

  1. Set the content of the odbc.ini to:

    [Databricks]
    Driver=<path-to-driver>
    Host=<server-hostname>
    Port=443
    HTTPPath=<http-path>
    ThriftTransport=2
    SSL=1
    AuthMech=3
    UID=token
    PWD=<personal-access-token>
    
  2. Set <personal-access-token> to the token that you retrieved in Authentication requirements.

  3. Set the HOST, PORT and HTTPPath configurations to the values that you retrieved in Retrieve the connection details.

  4. Use the DSN in your ODBC application by setting the DSN property in the connection string DSN=Databricks;.

ODBC driver capabilities

This section presents optional ODBC driver configurations.

Set the initial schema in ODBC

The ODBC driver allows you to specify the schema by setting Schema=<schema_name> as a connection configuration. This is equivalent to running USE <schema_name>.

ANSI SQL-92 query support in ODBC

The ODBC driver accepts SQL queries in ANSI SQL-92 dialect and translates the queries to the Databricks SQL dialect. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set UseNativeQuery=1 as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Databricks.

Extract large query results in ODBC

To achieve the best performance when you extract large query results, use the latest version of the ODBC driver that includes the following optimizations.

Arrow serialization in ODBC

The ODBC driver version 2.6.15 and above supports an optimized query results serialization format that uses Apache Arrow.

ODBC driver guide

For more information about the ODBC driver, refer to the installation and configuration guide: Simba Apache Spark ODBC Connector Install and Configuration Guide. The location of the Databricks ODBC driver installation and configuration guide depends on your platform:

  • MacOS: /Library/simba/spark

  • Linux: /opt/simba/spark

  • Windows: C:\Program Files\Simba Spark ODBC Driver

JDBC driver

This section presents the steps to configure your JDBC driver to connect to Databricks.

Install the Databricks JDBC driver in a Java project

The Databricks JDBC driver is available in the Maven Central repository. To include the Databricks JDBC driver in your Java project, add the following entry to your application’s pom.xml file, as follows.

Note

The following version value is subject to change. For available versions to choose from, see the Maven Central repository.

<dependency>
  <groupId>com.databricks<ʇgroupId>
  <artifactId>databricks-jdbc</artifactId>
  <version>2.6.25-1</version>
</dependency>

Download the Databricks JDBC driver

  1. Go to the Databricks JDBC driver download page to download the driver.

  2. The driver is packaged as a JAR, which does not require installation and can be added to the Java classpath.

Since JDBC 2.6.25 the driver name is DatabricksJDBC42.jar, whereas the legacy driver’s name is SparkJDBC42.jar. After you download the driver, use the following instructions to configure the driver:

Building the connection URL for the Databricks driver

To connect using a personal access token, first get the Server Hostname and Http Path from Retrieve the connection details.

The JDBC connection URL has the following general form:

jdbc:databricks://<Server Hostname>:443;HttpPath=<Http Path>[;property=value[;property=value]]

where:

  • jdbc:databricks:// (Required) is known as the subprotocol and is constant.

  • Server Hostname (Required) is the address of the server to connect to.

  • Http Path (Required) is the Databricks compute resources URL.

  • property (Optional) is one or more connection properties. See JDBC driver capabilities for more details.

You should avoid setting credentials via the JDBC URL. Instead, the recommended way of setting credentials is to pass them through the properties parameter to the DriverManager:

To authenticate using a personal access token, set the following properties collection:

String url = "jdbc:databricks://<Server Hostname>:443;HttpPath=<Http Path>";
Properties p = new java.util.Properties();
p.put("PWD", "<personal-access-token>");
DriverManager.getConnection(url, p);

where:

Building the connection URL for the legacy Spark driver

To connect to Databricks using the Spark JDBC driver you need to build a connection URL that has the following general form:

jdbc:spark://<Server Hostname>:443;HttpPath=<Http Path>;TransportMode=http;SSL=1[;property=value[;property=value]]

where:

  • jdbc:spark:// (Required) is known as the subprotocol and is constant.

  • Server Hostname (Required) is the address of the server to connect to.

  • Http Path (Required) is the Databricks compute resources URL.

  • property (Optional) is one or more connection properties. See JDBC driver capabilities for more details.

The driver also requires setting TransportMode and SSL properties. The Databricks recommended values of these properties are http and 1, respectively.

To authenticate using a personal access token, set the following properties collection:

String url = "jdbc:spark://<Server Hostname>;HttpPath=<Http Path>;TransportMode=http;SSL=1";
Properties p = new java.util.Properties();
p.put("PWD", "<personal-access-token>");
DriverManager.getConnection(url, p);

where:

JDBC driver capabilities

This section presents optional JDBC driver configurations. The same capabilities apply to both Databricks and legacy Spark drivers.

ANSI SQL-92 query support in JDBC

Legacy Spark JDBC drivers accept SQL queries in ANSI SQL-92 dialect and translate the queries to the Databricks SQL dialect before sending them to the server. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set UseNativeQuery=1 as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Databricks.

Extract large query results in JDBC

To achieve the best performance when you extract large query results, use the latest version of the JDBC driver, which includes the following optimizations.

Arrow serialization in JDBC

The JDBC driver version 2.6.16 and above supports an optimized query results serialization format that uses Apache Arrow.

JDBC driver guide

For more information about the JDBC driver, refer to the installation and configuration guide. Find the Databricks JDBC driver installation and configuration guide in the docs directory of the driver package.