Configure the Databricks ODBC and JDBC drivers
This article describes how to configure the Databricks ODBC and JDBC drivers to connect your tools or clients to Databricks. For tool or client specific connection instructions, see Technology partners.
Requirements
Before you start, you need to make sure you have the appropriate permissions to connect to Databricks, to prepare your credentials and to retrieve the connection details.
Permission requirements
There are two permissions you may need when you connect to a Databricks cluster:
Can Attach To
permission to connect to the running cluster.Can Restart
permission to automatically trigger the cluster to start if its state is terminated when connecting.
Authentication requirements
The Databricks ODBC and JDBC drivers support authentication by using a personal access token.
See how to get a workspace personal access token.
Retrieve the connection details
This section explains how to retrieve the connection details that you need to connect to Databricks.
Get connection details for a cluster
To establish connections to many external data sources, developer tools, or technology partners, you must provide connection details for your cluster. To retrieve connection details, do the following:
Log in to your Databricks workspace.
In the sidebar, click Compute.
Choose a cluster to connect to.
Navigate to Advanced Options.
Click on the JDBC/ODBC tab.
Copy the connection details.
Get connection details for a SQL warehouse
To establish connections to many external data sources, developer tools, or technology partners, you must provide connection details for your SQL warehouse. To retrieve connection details, do the following:
Log in to your Databricks workspace.
In the sidebar, click SQL > SQL Warehouses.
Choose a warehouse to connect to.
Navigate to the Connection Details tab.
Copy the connection details.
Alternatively, you can click the icon for one of the displayed technology partners or developer tools and follow the onscreen steps to connect using your SQL warehouse’s connection details.
ODBC driver
This section presents the steps to configure your ODBC driver to connect to Databricks.
Download the ODBC driver
Review the license agreement for the Databricks ODBC driver before installing the software.
Some tools and clients require you to install the Databricks ODBC driver to set up a connection to Databricks, while others embed the driver and do not require separate installation. For example, to use Tableau Desktop, the ODBC driver needs to be installed, while recent Power BI Desktop releases include the driver preinstalled and no action is needed.
Go to the Databricks ODBC driver download page to download the driver for your operating system.
After you download the appropriate installer of the driver for your platform, use the following instructions to install and configure the driver:
Install and configure the ODBC driver for Windows
In Windows, you can set up a Data Source Name (DSN) configuration to connect your ODBC client application to Databricks. To set up a DSN configuration, use the Windows ODBC Data Source Administrator.
Download the latest driver version for Windows, if you haven’t already done so. See Download the ODBC driver.
Double-click on the dowloaded
.msi
file to install the driver. The installation directory isC:\Program Files\Simba Spark ODBC Driver
.From the Start menu, search for ODBC Data Sources to launch the ODBC Data Source Administrator.
Navigate to the Drivers tab to verify that the driver (Simba Spark ODBC Driver) is installed.
Go to the User DSN or System DSN tab and click the Add button.
Select the Simba Spark ODBC Driver from the list of installed drivers.
Choose a Data Source Name and set the mandatory ODBC configuration and connection parameters. See also ODBC driver capabilities for more driver configurations.
Click OK to create the DSN.
Install and configure the ODBC driver for macOS
In macOS, you can set up a Data Source Name (DSN) configuration to connect your ODBC client application to Databricks. To set up a DSN on macOS, use the ODBC Manager.
Install ODBC Manager by using Homebrew, or download the ODBC Manager and then double-click on the downloaded
.dmg
file to install it.Download the latest driver version for macOS, if you haven’t already done so. See Download the ODBC driver.
Double-click on the dowloaded
.dmg
file to install the driver. The installation directory is/Library/simba/spark
.Start the ODBC Manager.
Navigate to the Drivers tab to verify that the driver (Simba Spark ODBC Driver) is installed.
Go to the User DSN or System DSN tab and click the Add button.
Select the Simba Spark ODBC Driver from the list of installed drivers.
Click Add.
Choose a Data Source Name and create key-value pairs to set the mandatory ODBC configuration and connection parameters. See also ODBC driver capabilities for more driver configurations.
Click OK to create the DSN.
Install and configure the ODBC driver for Linux
In Linux, you can set up a Data Source Name (DSN) configuration to connect your ODBC client application to Databricks. To set up a DSN on Linux, use the unixODBC Driver Manager.
Install unixODBC.
Download the latest driver version for Linux, if you haven’t already done so. See Download the ODBC driver.
Install the ODBC driver.
RPM package:
sudo yum --nogpgcheck localinstall simbaspark_<version>.rpm
DEB package:
sudo dpkg -i simbaspark_<version>.deb
The installation directory is
/opt/simba/spark
.Locate the
odbc.ini
driver configuration file that corresponds toSYSTEM DATA SOURCES
:odbcinst -j
In a text editor, open the
odbc.ini
configuration file.Create an
[ODBC Data Sources]
section:[ODBC Data Sources] Databricks=Databricks ODBC Connector
Create another section with the same name as your DSN and specify the configuration parameters as key-value pairs. See the mandatory ODBC configuration and connection parameters for more information about how to set the configuration parameters. See also ODBC driver capabilities for more driver configurations.
[Databricks] Driver=<path-to-driver> <key>=<value> ...
ODBC configuration and connection parameters
This section presents the mandatory (unless otherwise specified) configuration and connection parameters for the ODBC driver.
Configuration |
Description |
---|---|
|
Driver installation path |
|
Databricks server hostname |
|
443 |
|
Databricks compute resources URL |
|
1 |
|
2 |
|
Default schema |
|
Default catalog |
Authentication parameters
To authenticate by using a personal access token, set the following configurations:
Configuration |
Description |
---|---|
|
3 |
|
token |
|
Databricks personal access token for your workspace user |
Define the connection string to use in your application as follows:
Driver=<path-to-driver>;Host=<server-hostname>;Port=443;HTTPPath=<http-path>;ThriftTransport=2;SSL=1;AuthMech=3;UID=token;PWD=<personal-access-token>
Set the HOST and HTTPPath configurations to the values that you retrieved in Retrieve the connection details.
DSN examples for non-Windows machines
Here are some examples that show how to set up a DSN on different platforms based on your authentication method.
When you authenticate with a personal access token, complete the following steps:
Set the content of the
odbc.ini
to:[Databricks] Driver=<path-to-driver> Host=<server-hostname> Port=443 HTTPPath=<http-path> ThriftTransport=2 SSL=1 AuthMech=3 UID=token PWD=<personal-access-token>
Set
<personal-access-token>
to the token that you retrieved in Authentication requirements.Set the HOST, PORT and HTTPPath configurations to the values that you retrieved in Retrieve the connection details.
Use the DSN in your ODBC application by setting the DSN property in the connection string
DSN=Databricks;
.
ODBC driver capabilities
This section presents optional ODBC driver configurations.
Set the initial schema in ODBC
The ODBC driver allows you to specify the schema by setting Schema=<schema-name>
as a connection configuration. This is equivalent to running USE <schema-name>
.
ANSI SQL-92 query support in ODBC
The ODBC driver accepts SQL queries in ANSI SQL-92 dialect and translates the queries to the Databricks SQL dialect. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set UseNativeQuery=1
as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Databricks.
Extract large query results in ODBC
To achieve the best performance when you extract large query results, use the latest version of the ODBC driver that includes the following optimizations.
Arrow serialization in ODBC
The ODBC driver version 2.6.15 and above supports an optimized query results serialization format that uses Apache Arrow.
ODBC driver guide
For more information about the ODBC driver, refer to the installation and configuration guide: Simba Apache Spark ODBC Connector Install and Configuration Guide
. The location of the Databricks ODBC driver installation and configuration guide depends on your platform:
MacOS:
/Library/simba/spark
Linux:
/opt/simba/spark
Windows:
C:\Program Files\Simba Spark ODBC Driver
JDBC driver
This section presents the steps to configure your JDBC driver to connect to Databricks.
Install the Databricks JDBC driver in a Java project
The Databricks JDBC driver is available in the Maven Central repository. To include the Databricks JDBC driver in your Java project, add the following entry to your application’s pom.xml
file, as follows.
Note
The following version
value is subject to change. For available versions to choose from, see the Maven Central repository.
<dependency>
<groupId>com.databricks<ʇgroupId>
<artifactId>databricks-jdbc</artifactId>
<version>2.6.25-1</version>
</dependency>
Download the Databricks JDBC driver
Go to the Databricks JDBC driver download page to download the driver.
The driver is packaged as a JAR, which does not require installation and can be added to the Java classpath.
Since JDBC 2.6.25 the driver name is DatabricksJDBC42.jar
, whereas the legacy driver’s name is SparkJDBC42.jar
.
After you download the driver, use the following instructions to configure the driver:
Building the connection URL for the Databricks driver
To connect using a personal access token, first get the Server Hostname and Http Path from Retrieve the connection details.
The JDBC connection URL has the following general form:
jdbc:databricks://<Server Hostname>:443;HttpPath=<Http Path>[;property=value[;property=value]]
where:
jdbc:databricks://
(Required) is known as the subprotocol and is constant.Server Hostname
(Required) is the address of the server to connect to.Http Path
(Required) is the Databricks compute resources URL.property
(Optional) is one or more connection properties. See JDBC driver capabilities for more details.
You should avoid setting credentials via the JDBC URL. Instead, the recommended way of setting credentials is to pass them through the properties parameter to the DriverManager
:
To authenticate using a personal access token, set the following properties collection:
String url = "jdbc:databricks://<Server Hostname>:443;HttpPath=<Http Path>";
Properties p = new java.util.Properties();
p.put("PWD", "<personal-access-token>");
DriverManager.getConnection(url, p);
where:
PWD
is the personal access token that you obtained in Authentication requirements.
Building the connection URL for the legacy Spark driver
To connect to Databricks using the Spark JDBC driver you need to build a connection URL that has the following general form:
jdbc:spark://<Server Hostname>:443;HttpPath=<Http Path>;TransportMode=http;SSL=1[;property=value[;property=value]]
where:
jdbc:spark://
(Required) is known as the subprotocol and is constant.Server Hostname
(Required) is the address of the server to connect to.Http Path
(Required) is the Databricks compute resources URL.property
(Optional) is one or more connection properties. See JDBC driver capabilities for more details.
The driver also requires setting TransportMode and SSL properties. The Databricks recommended values of these properties are http
and 1, respectively.
To authenticate using a personal access token, set the following properties collection:
String url = "jdbc:spark://<Server Hostname>;HttpPath=<Http Path>;TransportMode=http;SSL=1";
Properties p = new java.util.Properties();
p.put("PWD", "<personal-access-token>");
DriverManager.getConnection(url, p);
where:
PWD
is the personal access token that you obtained in Authentication requirements.
JDBC driver capabilities
This section presents optional JDBC driver configurations. The same capabilities apply to both Databricks and legacy Spark drivers.
ANSI SQL-92 query support in JDBC
Legacy Spark JDBC drivers accept SQL queries in ANSI SQL-92 dialect and translate the queries to the Databricks SQL dialect before sending them to the server. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set UseNativeQuery=1
as a connection configuration.
With that setting, the driver passes the SQL queries verbatim to Databricks.
Default catalog and schema
To specify the default catalog and schema, add ConnCatalog=<catalog-name>;ConnSchema=<schema-name>
to the JDBC connection URL.
Extract large query results in JDBC
To achieve the best performance when you extract large query results, use the latest version of the JDBC driver, which includes the following optimizations.
Arrow serialization in JDBC
The JDBC driver version 2.6.16 and above supports an optimized query results serialization format that uses Apache Arrow.