Query SQL Server with Databricks
This article shows how you can connect Databricks to Microsoft SQL server to read and write data.
Experimental
The configurations described in this article are Experimental. Experimental features are provided as-is and are not supported by Databricks through customer technical support. To get full query federation support, you should instead use Lakehouse Federation, which enables your Databricks users to take advantage of Unity Catalog syntax and data governance tools.
Configure a connection to SQL server
In Databricks Runtime 11.3 LTS and above, you can use the sqlserver
keyword to use the included driver for connecting to SQL server. When working with DataFrames, use the following syntax:
remote_table = (spark.read
.format("sqlserver")
.option("host", "hostName")
.option("port", "port") # optional, can use default port 1433 if omitted
.option("user", "username")
.option("password", "password")
.option("database", "databaseName")
.option("dbtable", "schemaName.tableName") # (if schemaName not provided, default to "dbo")
.load()
)
val remote_table = spark.read
.format("sqlserver")
.option("host", "hostName")
.option("port", "port") // optional, can use default port 1433 if omitted
.option("user", "username")
.option("password", "password")
.option("database", "databaseName")
.option("dbtable", "schemaName.tableName") // (if schemaName not provided, default to "dbo")
.load()
When working with SQL, specify sqlserver
in the USING
clause and pass options while creating a table, as shown in the following example:
DROP TABLE IF EXISTS sqlserver_table;
CREATE TABLE sqlserver_table
USING sqlserver
OPTIONS (
dbtable '<schema-name.table-name>',
host '<host-name>',
port '1433',
database '<database-name>',
user '<username>',
password '<password>'
);
Use the legacy JDBC driver
In Databricks Runtime 10.4 LTS and below, you must specify the driver and configurations using the JDBC settings. The following example queries SQL Server using its JDBC driver. For more details on reading, writing, configuring parallelism, and query pushdown, see Query databases using JDBC.
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
database_host = "<database-host-url>"
database_port = "1433" # update if you use a non-default port
database_name = "<database-name>"
table = "<table-name>"
user = "<username>"
password = "<password>"
url = f"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"
remote_table = (spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
)
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val database_host = "<database-host-url>"
val database_port = "1433" // update if you use a non-default port
val database_name = "<database-name>"
val table = "<table-name>"
val user = "<username>"
val password = "<password>"
val url = s"jdbc:sqlserver://{database_host}:{database_port};database={database_name}"
val remote_table = spark.read
.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()