Manage workspace storage

Your organization’s privacy requirements may require that you:

  • Occasionally purge deleted objects like notebook cells, entire notebooks, experiments, or cluster logs.
  • Store all interactive notebook results in the GCS bucket for system data of your cloud account, rather than the Databricks-managed control plane default location where some notebook command results are stored.

Purge workspace objects

You can delete workspace objects such as entire notebooks, individual notebook cells, individual notebook comments, and experiments, but they are recoverable.

To permanently purge deleted workspace objects:

  1. Go to the Admin Console.

  2. Click the Workspace Settings tab.

  3. In the Storage section, click the Purge button next to Permanently purge workspace storage.

  4. Click the Purge button.

  5. Click Yes, purge to confirm.

    Warning

    Once purged, workspace objects are not recoverable.

Purge notebook revision history

To permanently purge notebook revision history:

  1. Go to the Admin Console.

  2. Click the Workspace Settings tab.

  3. Next to Permanently purge all revision history, select the timeframe to purge. The default is 24 hours and older.

  4. Next to the timeframe, click the Purge button.

  5. Click the Purge button.

  6. Click Yes, purge to confirm.

    Warning

    Once purged, revision history is not recoverable.

Purge cluster logs

To permanently purge Spark driver logs and historical metrics snapshots for all clusters in the workspace:

  1. Go to the Admin Console.

  2. Click the Workspace Settings tab.

  3. Next to Permanently purge cluster logs, click the Purge button.

  4. Click Yes, purge to confirm.

    Warning

    Once purged, cluster logs are not recoverable.

Modify the storage location for notebook results

Notebook command output is stored differently depending on how you run the notebook.

In a default configuration:

  • When you run a notebook interactively by clicking Run in the notebook:
    • If the results are small, they are stored in the Databricks control plane, along with the notebook’s command contents and metadata.
    • Larger results are stored in the workspace’s GCS bucket for system data in your Google Cloud account. Databricks automatically creates the GCS bucket for system data. Databricks uses this storage area for workspace system data and your workspace’s DBFS root. Notebook results are stored in workspace system data storage, which is not accessible by users.
    • Plot images and other binary objects are always stored separately in the FileStore area of the DBFS root.
  • When you run a notebook as a job by scheduling it or by clicking Run Now on the Jobs page, all results are stored in the workspace’s GCS bucket for system data in your account.

You can configure your workspace to store all interactive notebook results in your cloud account, regardless of result size.

Configure the storage location for interactive notebook results

Preview

This feature is in Public Preview.

You can configure your workspace to store all interactive notebook results in your Google Cloud account, rather than the control plane. You can enable this feature using the admin console or REST API. This configuration has no effect on notebooks run as jobs, whose results are already stored in your Google Cloud account by default.

Keep the following points in mind:

  • Changes to this configuration are effective only for new results. Existing notebook results are not moved.
  • Some metadata about the results, such as chart column names, continue to be stored in the control plane.
  • Increased storage costs may be incurred on your cloud provider.
  • Increased network and IO latency may occur when reading and writing results.

Store all notebook results in your account using the admin console

As a workspace administrator:

  1. Go to the Admin Console.
  2. Click the Workspace Settings tab.
  3. In the Advanced section, click the Store Interactive Notebook Results in Customer Account toggle.
  4. Click Confirm.

Store all notebook results in your account using the REST API

To configure your workspace to store all notebook results in your Google Cloud account using the REST API:

  • You must be a workspace administrator.
  • You need a personal access token. The instructions that follow assume that you have configured a .netrc file with your personal access token so that you can use the -n option in curl commands. See the article referenced above for details.

To get the current setting, call the GET /workspace-conf endpoint and set keys to storeInteractiveNotebookResultsInCustomerAccount:

curl -n --request GET \
  'https://<databricks-instance>/api/2.0/workspace-conf?keys=storeInteractiveNotebookResultsInCustomerAccount'

To enable your workspace to store interactive notebook results in your Google Cloud account, call the PATCH /workspace-conf endpoint and set storeInteractiveNotebookResultsInCustomerAccount to true in the request body:

curl -n --request PATCH \
 'https://<databricks-instance>/api/2.0/workspace-conf' \
 --header 'Content-Type: text/plain' \
 --data-raw '{
    "storeInteractiveNotebookResultsInCustomerAccount": "true"
}'

To disable the feature, set the same flag to false:

curl -n --request PATCH \
  'https://<databricks-instance>/api/2.0/workspace-conf' \
 --header 'Content-Type: text/plain' \
 --data-raw '{
    "storeInteractiveNotebookResultsInCustomerAccount": "false"
}'