With secure cluster connectivity enabled, customer VPCs in the data plane have no open ports and Databricks Runtime cluster nodes have no public IP addresses.
Databricks secure cluster connectivity on Google Cloud is implemented by two features:
No public IP addresses on cluster nodes, by default: There is a workspace-level setting that defines the type of GKE cluster to create for the workspace in your Google Cloud account. The default is a private GKE cluster, which means that there are no public IP addresses for cluster nodes.
The secure cluster connectivity relay: New clusters initiate a connection to the control plane secure cluster connectivity relay during cluster creation. The relay uses port 443 (HTTPS) on a different IP address than the main ingress for the web application and REST API. When the control plane runs a notebook or starts a new Databricks Runtime job, the request is sent to the cluster through this reverse tunnel. With this relay, there is one less public IP address that is required to send commands from the Databricks control plane to a Databricks Runtime cluster.
Even with the default configuration (a private GKE cluster) and the secure cluster connectivity relay enabled in your region, there remains one public IP address in your account for GKE cluster control, also known as the GKE
kube-master, which helps start and manage Databricks Runtime clusters. The
kube-master is a part of the Google Cloud default GKE deployment. Its IP address is in your Google Cloud account but not in your data plane VPC. This IP address is managed by GKE and it has a firewall rule that allows traffic only from the Databricks control plane.
For a workspace to have secure cluster connectivity, both features must be enabled:
By default, there are no public IP addresses. If you clear the Enable private cluster setting when you create your workspace, the workspace has a public GKE cluster and its nodes have public IP addresses, in which case the workspace is not using secure cluster connectivity.
For workspaces in all regions, all clusters automatically use the secure cluster connectivity relay.
Contact your Databricks representative for any questions.