Configure audit log delivery

Note

This feature requires the Premium plan.

This article describes how to configure delivery of audit logs.

Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns. For details about logged events, see Audit log reference.

As a Databricks account owner or account admin, you can configure delivery of audit logs in JSON file format to a Google Cloud Storage (GCS) storage bucket, where you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events.

Set up audit log delivery

To configure audit log delivery, you must set up a GCS bucket, give Databricks access to the bucket, and then use the account console to define a log delivery configuration that tells Databricks where to deliver your logs.

You cannot edit a log delivery configuration after creation, but you can temporarily or permanently disable a log delivery configuration using the account console. You can have a maximum of two currently-enabled audit log delivery configurations.

You can use the Google Cloud Console or the Google CLI to create a Google Cloud Storage bucket in your GCP account. The following instructions assume you will use the Google Cloud Console.

Create and configure your GCS bucket

  1. Use the Google Cloud Console to create a Google Cloud Storage bucket in your GCP account.

    • For region, choose multi-region.

    • For storage class, choose Standard for typical usage. See the Google article for storage classes.

    • For control access, choose Uniform.

  2. Click the Permissions tab on your new bucket.

  3. Click ADD, and then enter the Service Account log-delivery@databricks-prod-master.iam.gserviceaccount.com as New member of the storage bucket. Grant the Service Account the Storage Admin role under Cloud Storage, without specifying an access condition.

    This is required for Databricks to write and list the delivered log files for this bucket. You cannot give permission to only a bucket subdirectory. See the Google article about access control, which recommends that you create multiple buckets for granular access permissions.

    Log delivery bucket permission

Create a log delivery configuration

A log delivery configuration defines the path to the GCS bucket location where you want Databricks to deliver your audit logs.

  1. As an account admin, log in to the Databricks account console.

  2. Click Settings.

  3. Click Log delivery.

    Log delivery config
  4. Click Add log delivery.

  5. In Log delivery configuration name, add a name that is unique within your Databricks account. Spaces are allowed.

  6. In GCS bucket name, specify your GCS bucket name.

  7. In Delivery path prefix, optionally specify a prefix to be used in the path. See Location.

    The prefix can include forward slash characters but cannot start with a slash. Otherwise, the prefix can include any valid GCS object path characters. Note that space characters are not allowed.

  8. Click Add log delivery.

Disable or enable a log delivery configuration

You cannot edit or delete a log delivery configuration after creation, but you can temporarily or permanently disable a log delivery configuration using the account console. You can have a maximum of two enabled audit log delivery configurations at a time.

To disable a log delivery configuration:

  1. As an account admin, log in to the Databricks account console.

  2. Click Settings.

  3. Click Log delivery.

  4. Next to the log delivery configuration you want to disable, click the three dot icon to the right of the name.

    • To disable it, select Disable log delivery.

    • To enable it, select Enable log delivery.

Latency

  • Up to one hour after log delivery configuration, audit delivery begins and you can access the JSON files.

  • After audit log delivery begins, auditable events are typically logged within one hour. New JSON files potentially overwrite existing files for each workspace. Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

  • Enabling or disabling a log delivery configuration can take up to an hour to take effect.

Location

The delivery location is:

gs://<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json

If the optional delivery path prefix is omitted, the delivery path does not include <delivery-path-prefix>/.

Account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

For more information about accessing these files and analyzing them using Databricks, see Analyze audit logs.

Enable verbose audit logs

In addition to the default events, you can configure a workspace to generate additional events by enabling verbose audit logs.

To enable or disable verbose audit logs, do the following:

  1. As a workspace admin, go to the Databricks admin settings page.

  2. Click the Advanced tab.

  3. Next to Verbose Audit Logs, enable or disable the feature.

When you enable or disable verbose logging, an auditable event is emitted in the category workspace with action workspaceConfKeys. The workspaceConfKeys request parameter is enableVerboseAuditLogs. The request parameter workspaceConfValues is true (feature enabled) or false (feature disabled).

Additional verbose audit logs

When you configure verbose audit logs, your logs include the following additional events:

Service

Action name

Description

Request parameters

notebook

runCommand

Emitted after an interactive user runs a command in a notebook. A command corresponds to a cell in a notebook.

  • notebookId

  • executionTime

  • status

  • commandId

  • commandText

jobs

runCommand

Emitted after a command in a notebook is executed by a job run. A command corresponds to a cell in a notebook.

  • jobId

  • runId

  • notebookId

  • executionTime

  • status

  • commandId

  • commandText

databrickssql

commandSubmit

Runs when a command is submitted to Databricks SQL.

  • commandText

  • warehouseId

  • commandId

databrickssql

commandFinish

Runs when a command completes or a command is cancelled.

  • warehouseId

  • commandId

Check the response field for additional information related to the command result:

  • statusCode - The HTTP response code. This will be error 400 if it is a general error.

  • errorMessage - Error message.

    Note

    In some cases for certain long-running commands, the errorMessage field might not be populated on failure.

  • result: This field is empty.