Databricks access to customer workspaces using Genie

In general, Databricks personnel cannot access customer workspaces. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects (notebooks, libraries, and experiments) into folders, and provides access to data and computational resources such as clusters and jobs. To resolve some types of technical issues, it may be necessary to grant personnel access to customer workspaces and underlying infrastructure.

To grant secure access, Databricks uses an internal application called Genie. There is a Genie instance for each cloud, such as AWS, Azure, and Google Cloud. Genie access for AWS and Google Cloud requires multi-factor authentication and requires users to be on the Databricks network or Databricks VPN. Genie for Azure requires multi-factor authentication and requires users to be on the Databricks network, Databricks VPN, or the Microsoft internal support network. Databricks limits the set of users who can access Genie and which types of access may be granted to each user. See the following sections for the types of access.

This document describes the general security processes currently in place for Genie.

From time to time, Databricks updates Genie security controls, and this document will evolve over time. Please see the revision history at the bottom.

Almost no customer data is stored within the Databricks-owned account. For details about the Databricks architecture, see Databricks architecture overview.

Access types for Genie

There are two categories of access to Genie: workspace access and infrastructure access.

Important

For both types of access, there is also a mechanism for emergency access (in the event that the ticketing interaction fails), which does not require a ticket. This access is rarely used and requires approval from a very small number of Databricks staff. Emergency access is still reflected in customer audit logs.

Access to the web application

Databricks customer support personnel (or individuals in direct support roles such as Solution Architects) can use Genie to request HTTPS access to the Web application to provide support.

Databricks support personnel must enter the Databricks Web application ID of the customer and provide a valid Salesforce support ticket identification number, which must be associated with the customer’s workspace and remain in an Open status at the time that access is requested. Support personnel are required to gain and document your consent before using Genie to access your workspace. If Databricks customer support personnel require additional troubleshooting, they create an internal Engineering Support Ticket for the Databricks Engineering Team.

The Databricks Engineering Team can log into Genie to request access to your workspace in the web application for further troubleshooting or emergency support. The Databricks Engineering Team follows the same process as above except they enter the web application ID and an internal Engineering Support Ticket (not a Customer Support Ticket).

Genie grants web application access through a time-limited access token. After the session time expires, the request process must be repeated.

At that point, the user will have web browser access to the workspace as if they were a workspace admin. Certain other security controls are also applied (for example, Genie users cannot create long-lived personal access tokens). Databricks performs threat modeling to identify scenarios for abuse and to provide technical controls to mitigate risk.

Additionally, if you’ve configured audit log delivery, audit logs show the initial Genie event and Databricks staff actions. Actions taken within the system will be included in the audit logs, similar to auditable events from your own users. In the current implementation, the Databricks user jsmith@databricks.com appears as jsmith+dbadmin@databricks.com within the audit logs.

Access to internal core production infrastructure systems

Only personnel in the Databricks engineering organization who support internal infrastructure can log into Genie and access the Databricks core production infrastructure systems. If Databricks personnel outside infrastructure support roles request access to such systems, additional approvals are required. Genie grants access through a time-limited TLS client certificate. After the session time expires, the request process must be repeated.

The control plane is subdivided into microservices, and access is granted to the required service. Infrastructure access does not provide UI access to any customer’s Databricks deployment, and there are limitations to data access based on service isolation. Customers can further reduce risk of data exposure by leveraging capabilities such as Customer-managed Keys to encrypt certain data (such as notebooks, secrets, Databricks SQL queries and query history) within the control plane, which adds additional technical barriers to infrastructure Genie access to that data.

Because the internal core production infrastructure systems are generally not specific to any one customer’s deployment, this Genie access does not create events in your audit logs.

Web application security controls for Genie

Genie access via the Web UI (workspace-level access) requires either a support ticket or engineering ticket tied expressly to your workspace. There are technical controls requiring that the ticket must be open and that the workspace is present in a specific field. Most Genie events originate from a support ticket. You must explicitly grant access, either by clicking a checkbox when submitting the ticket or explicitly approving it in the text conversation with the support engineer.

Genie access is limited to a subset of employees who have a role in supporting customers.

The Genie system is accessible only over VPN (which requires a multi-factor prompt), and the authentication into Genie is also configured to always require an additional multi-factor prompt.

Genie access is specific to the given workspace. For example, if customer A authorizes the usage of Genie in a support ticket for a particular workspace, the support engineer cannot use that to access a workspace for customer B or to access a different workspace for customer A.

Each usage of Genie is also limited in time. For AWS and Google Cloud accounts, the maximum time is 24 hours and the default is 60 minutes.

If you have enabled audit log delivery, those logs will show the Genie event. Importantly, the initial access to your workspace is facilitated by Genie, but activities thereafter are bound by normal Databricks rules (as if the support staff were your employee). Any actions performed by Databricks staff once in the workspace generate audit log events just as they would for your staff.

Databricks retains Genie logs for at least one year internally, and is happy to help customers build alerting pipelines for Genie activity (such as unusual API calls from support staff accessing the workspace with Genie or new support staff using Genie) Databricks has strong technical controls for automatic termination of accounts (currently performed via automation when the Human Resources Information System processes a termination). Additionally, Databricks performs a quarterly account review as an additional check to guard against accounts not properly terminated.

Internal core infrastructure security controls for Genie

Genie access via command-line for engineers (infrastructure-level access) requires an open engineering ticket in the back-end engineering ticketing system (or emergency access, as detailed above, in the event of a ticketing system failure).

Users must be a part of an engineering group that has a role in supporting customers.

The Genie system is only accessible over VPN (which requires a multi-factor prompt), and authentication into Genie is also configured to always require an additional multi-factor prompt.

Each usage of Genie is also limited in time. For AWS and Google Cloud environments, the maximum time is 24 hours, and the default is 60 minutes.

Databricks retains Genie logs for at least one year internally.

While Databricks has strong technical controls for automatic termination of accounts (currently performed via automation when the Human Resources Information System processes a termination), Databricks also performs a quarterly account review as an additional check to guard against accounts not properly terminated.