Networking recommendations for Lakehouse Federation
This article provides guidance for setting up a viable network path between your Databricks clusters or SQL warehouses and the external database system that you are connecting to using Lakehouse Federation.
Bear the following important information in mind:
All network traffic is directly between Databricks clusters (or SQL warehouses) and the external database system. Neither Unity Catalog or the Databricks control plane are on the network path.
Databricks compute (that is, clusters and SQL warehouses) always deploys in the cloud, but the external database system can be on-premises or hosted on any cloud provider, as long as there’s a viable network path between your Databricks compute and the external database.
If you have inbound or outbound network restrictions on either Databricks compute or the external database system, refer to the following sections for general guidance to help you create a viable network path.
For more information on networking in Databricks workspaces, see Networking.
Database system and Databricks compute both accessible from internet
The connection should work without any configuration.
Database system has network access restrictions
If the external database system has inbound or outbound network access restrictions and the Databricks cluster or SQL warehouse is accessible from the internet, then configure one of the following network solutions to connect from classic compute resources:
Stable egress IP on Databricks compute.
From the classic compute plane, set up a stable IP address with a load balancer, NAT gateway, internet gateway, or equivalent, and connect it to the subnet where Databricks compute is deployed. This allows the compute resource to share a stable public IP address that can be allowlisted on the external database side.
Private Service Connect (only when the external database is on the same cloud as Databricks compute)
From the classic compute plane, configure a Private Service Connect connection between the network where the database is deployed and the network where Databricks compute is deployed.
Databricks compute has network access restrictions
If the external database system is accessible from the Internet and the Databricks compute has inbound or outbound network access restrictions (which is only possible if you are on a customer-managed network), then perform one of the following configurations:
Allowlist the hostname of the external database in the firewall rules of the subnet where Databricks compute is deployed.
If you choose to allowlist the external database IP address rather than hostname, make sure that the external database has a stable IP address.
Private Service Connect (only when the external database is on same cloud as Databricks compute)
Configure a Private Service Connect connection between the network where the database is deployed and the network where Databricks compute is deployed.
Databricks compute has a custom DNS server
If the external database system is accessible from the Internet and the Databricks compute has a custom DNS server (which is only possible if you are on a customer-managed network), add the database system’s hostname to your custom DNS server so that it can be resolved.