Network access
This article introduces network security configurations for the deployment and management of Databricks accounts and workspaces.
Overview of network security
Databricks provides a secure networking environment by default, but if your organization has additional needs, you can configure network security features on your Databricks resources. Not all security features are available on all pricing tiers. The following table contains an overview of the features and how they align to pricing plans.
Feature |
Pricing tier |
---|---|
Customer-managed VPC |
Premium |
Private Service Connect support |
Premium |
Secure cluster connectivity |
Standard |
IP access lists |
Premium |
Deploy a workspace in your own VPC
An Google Cloud Virtual Private Cloud (VPC) lets you provision a logically isolated section of Google Cloud where you can launch GCP resources in a virtual network. The VPC is the network location for your Databricks clusters. By default, Databricks creates and manages a VPC for the Databricks workspace.
You can instead provide your own VPC to host your Databricks clusters, enabling you to maintain more control of your own GCP account and limit outgoing connections. To take advantage of a customer-managed VPC, you must specify a VPC when you first create the Databricks workspace. You can share VPCs across workspaces, but you cannot share subnets across workspaces. For more information, see Customer-managed VPC.
Enable Private Service Connect
Google Private Service Connect (PSC) private connectivity and mitigate data exfiltration risks. Databricks supports two Private Service Connect connection types:
Front-end (user to workspace): Allows users to connect to the Databricks web application, REST API, and Databricks Connect API over a Virtual Private Cloud (VPC) endpoint.
Back-end (data plane to control plane): This enables private connectivity from Databricks compute in a customer-managed VPC to a Databricks workspace’s core services.
For more information, see Enable Private Service Connect for your workspace.
Deploy a workspace with secure cluster connectivity
All new workspaces are created with secure cluster connectivity by default. When secure cluster connectivity is enabled, customer virtual networks have no open ports and Databricks Runtime cluster nodes have no public IP addresses. This simplifies network administration by removing the need to configure ports on security groups or network peering. To learn more about deploying a workspace with secure cluster connectivity, see Secure cluster connectivity.
IP access lists
Authentication proves user identity, but it does not enforce the network location of the users. Accessing a cloud service from an unsecured network poses security risks, especially when the user may have authorized access to sensitive or personal data. With IP access lists, you can configure Databricks workspaces so that users connect to the service only through existing networks with a secure perimeter.
Workspace admins can specify the IP addresses (or CIDR ranges) on the public network that are allowed access. These IP addresses could belong to egress gateways or specific user environments. You can also specify IP addresses or subnets to block. For details, see IP access lists for workspaces.
You can also use Private Service Connect to block all public internet access to a Databricks workspace.
You can also control access to the account console using IP access lists using a similar system that can be configured either through a UI or an API. This feature only controls access to the account console. To control IP address access to the account console, see IP access lists for the account console.
Configure firewall rules
Many organizations use firewall to block traffic based on domain names. You must allow list Databricks domain names to ensure access to Databricks resources. For more information, see Configure domain name firewall rules.
Automation template options
Using Databricks REST APIs, some of your security configuration tasks can be automated using Terraform. These templates can be used to configure and deploy new workspaces as well as to update administrative configurations for existing workspaces. Particularly for large companies with dozens of workspaces, using templates can enable fast and consistent automated configurations.