Use SQLAlchemy with Databricks

Databricks provides a SQLAlchemy dialect (the system SQLAlchemy uses to communicate with various types of database API implementations and databases) for Databricks. SQLAlchemy is a Python SQL toolkit and Object Relational Mapper (ORM). SQLAlchemy provides a suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language. See Features and Philosophy.

The SQLAlchemy dialect for Databricks needs to be installed to use the SQLAlchemy features with Databricks. This article covers SQLAlchemy dialect for Databricks version 1.0 and 2.0, which will be based on Databricks SQL Connector for Python version 4.0.0 or above.

Requirements

A development machine running Python >=3.8 and <=3.11.
Databricks recommends that you use Python virtual environments, such as those provided by venv that are included with Python. Virtual environments help to ensure that you are using the correct versions of Python and the Databricks SQL Connector for Python together. Setting up and using virtual environments is outside of the scope of this article. For more information, see Creating virtual environments.
An existing cluster or SQL warehouse.

Get started

Install databricks-sqlalchemy for SQLAlchemy v1 using pip install databricks-sqlalchemy~=1.0 or for SQLAlchemy v2 using pip install databricks-sqlalchemy. For version information, see the databricks-sqlalchemy release history.
Gather the following information for the cluster or SQL warehouse that you want to use:
- The server hostname of the cluster. You can get this from the Server Hostname value in the Advanced Options > JDBC/ODBC tab for your cluster.
- The HTTP path of the cluster. You can get this from the HTTP Path value in the Advanced Options > JDBC/ODBC tab for your cluster.
- The server hostname of the SQL warehouse. You can get this from the Server Hostname value in the Connection Details tab for your SQL warehouse.
- The HTTP path of the SQL warehouse. You can get this from the HTTP Path value in the Connection Details tab for your SQL warehouse.

Authentication

The SQLAlchemy dialect for Databricks supports Databricks personal access token authentication.

To create a Databricks personal access token, follow the steps in Databricks personal access tokens for workspace users.

To authenticate the SQLAlchemy dialect, use the following code snippet. This snippet assumes that you have set the following environment variables:

DATABRICKS_TOKEN, set to the Databricks personal access token.
DATABRICKS_SERVER_HOSTNAMEset to the Server Hostname value for your cluster or SQL warehouse.
DATABRICKS_HTTP_PATH, set to HTTP Path value for your cluster or SQL warehouse.
DATABRICKS_CATALOG, set to the target catalog in Unity Catalog.
DATABRICKS_SCHEMA, set to the target schema (also known as a database) in Unity Catalog.

To set environment variables, see your operating system’s documentation.

import os
from sqlalchemy import create_engine

access_token    = os.getenv("DATABRICKS_TOKEN")
server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path       = os.getenv("DATABRICKS_HTTP_PATH")
catalog         = os.getenv("DATABRICKS_CATALOG")
schema          = os.getenv("DATABRICKS_SCHEMA")

engine = create_engine(
  url = f"databricks://token:{access_token}@{server_hostname}?" +
        f"http_path={http_path}&catalog={catalog}&schema={schema}"
)


# ...

You use the preceding engine variable to connect to your specified catalog and schema through your Databricks compute resource.

SQLAlchemy v1

For connection examples, refer this example.py

SQLAlchemy v2

For connection examples, see the following section and the sqlalchemy_example.py file in GitHub.

DBAPI reference

databricks-sqlalchemy v1 GitHub repository
SQLAlchemy v1 README.md
databricks-sqlalchemy v2 GitHub repository
SQLAlchemy v2 README.md