Access audit logs

Preview

This feature is in Public Preview.

Note

This feature requires the Databricks Premium Plan.

Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.

There are two types of logs:

  • Workspace-level audit logs with workspace-level events.
  • Account-level audit logs with account-level events.

For a list of each of these types of events and the associated services, see Audit events.

As a Databricks account owner or account admin, you can configure delivery of audit logs in JSON file format to a Google Cloud Storage (GCS) storage bucket, where you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events.

To configure audit log delivery, you must set up a GCS bucket, give Databricks access to the bucket, and then use the account console to define a log delivery configuration that tells Databricks where to deliver your logs.

You cannot edit a log delivery configuration after creation, but you can temporarily or permanently disable a log delivery configuration using the account console. You can have a maximum of two currently-enabled audit log delivery configurations.

To configure log delivery, see Configure audit log delivery.

Latency

  • Up to one hour after log delivery configuration, audit delivery begins and you can access the JSON files.
  • After audit log delivery begins, auditable events are typically logged within one hour. New JSON files potentially overwrite existing files for each workspace. Overwriting ensures exactly-once semantics without requiring read or delete access to your account.
  • Enabling or disabling a log delivery configuration can take up to an hour to take effect.

Location

The delivery location is:

gs://<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json

If the optional delivery path prefix is omitted, the delivery path does not include <delivery-path-prefix>/.

Account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

For more information about accessing these files and analyzing them using Databricks, see Analyze audit logs.

Schema

Databricks delivers audit logs in JSON format. The schema of audit log records is as follows.

  • version: The schema version of the audit log format.
  • timestamp: UTC timestamp of the action.
  • workspaceId: ID of the workspace this event relates to. May be set to “0” for account-level events that apply to no workspace.
  • sourceIPAddress: The IP address of the source request.
  • userAgent: The browser or API client used to make the request.
  • sessionId: Session ID of the action.
  • userIdentity: Information about the user that makes the requests.
    • email: User email address.
  • serviceName: The service that logged the request.
  • actionName: The action, such as login, logout, read, write, and so on.
  • requestId: Unique request ID.
  • requestParams: Parameter key-value pairs used in the audited event.
  • response: Response to the request.
    • errorMessage: The error message if there was an error.
    • result: The result of the request.
    • statusCode: HTTP status code that indicates the request succeeds or not.
  • auditLevel: Specifies if this is a workspace-level event (WORKSPACE_LEVEL) or account-level event (ACCOUNT_LEVEL).
  • accountId: Account ID of this Databricks account.

Audit events

The serviceName and actionName properties identify an audit event in an audit log record.

Workspace-level audit logs are available for these services:

  • accounts
  • clusters
  • dbfs
  • genie
  • globalInitScripts
  • groups
  • iamRole
  • instancePools
  • jobs
  • mlflowExperiment
  • notebook
  • repos
  • secrets
  • sqlAnalytics
  • sqlPermissions, which has all audit logs for table access when table access control lists are enabled.
  • ssh
  • workspace

Account-level audit logs are available for these services:

  • accountBillableUsage: Access to billable usage for the account.
  • logDelivery: Log delivery configurations.
  • accountsManager: Actions performed in the accounts console.

Account-level events have the workspaceId field set to a valid workspace ID if they reference workspace-related events like creating or deleting a workspace. If they are not associated with any workspace, the workspaceId field is set to 0.

Note

  • If actions take a long time, the request and response are logged separately but the request and response pair have the same requestId.
  • With the exception of mount-related operations, Databricks audit logs do not include DBFS-related operations.
  • Automated actions such as resizing a cluster due to autoscaling or launching a job due to scheduling are performed by the user System-User.

Request parameters

The request parameters in the field requestParams for each supported service and action are listed in the following sections, grouped by workspace-level events and account-level events.

Workspace-level audit log events

Service Action Request Parameters
accounts add [“targetUserName”, “endpoint”, “targetUserId”]
  addPrincipalToGroup [“targetGroupId”, “endpoint”, “targetUserId”, “targetGroupName”, “targetUserName”]
  changePassword [“newPasswordSource”, “targetUserId”, “serviceSource”, “wasPasswordChanged”, “userId”]
  createGroup [“endpoint”, “targetGroupId”, “targetGroupName”]
  delete [“targetUserId”, “targetUserName”, “endpoint”]
  garbageCollectDbToken [“tokenExpirationTime”, “userId”]
  generateDbToken [“userId”, “tokenExpirationTime”]
  jwtLogin [“user”]
  login [“user”]
  logout [“user”]
  removeAdmin [“targetUserName”, “endpoint”, “targetUserId”]
  removeGroup [“targetGroupId”, “targetGroupName”, “endpoint”]
  resetPassword [“serviceSource”, “userId”, “endpoint”, “targetUserId”, “targetUserName”, “wasPasswordChanged”, “newPasswordSource”]
  revokeDbToken [“userId”]
  samlLogin [“user”]
  setAdmin [“endpoint”, “targetUserName”, “targetUserId”]
  tokenLogin [“tokenId”, “user”]
  validateEmail [“endpoint”, “targetUserName”, “targetUserId”]
clusters changeClusterAcl [“shardName”, “aclPermissionSet”, “targetUserId”, “resourceId”]
  create [“cluster_log_conf”, “num_workers”, “enable_elastic_disk”, “driver_node_type_id”, “start_cluster”, “docker_image”, “ssh_public_keys”, “aws_attributes”, “acl_path_prefix”, “node_type_id”, “instance_pool_id”, “spark_env_vars”, “init_scripts”, “spark_version”, “cluster_source”, “autotermination_minutes”, “cluster_name”, “autoscale”, “custom_tags”, “cluster_creator”, “enable_local_disk_encryption”, “idempotency_token”, “spark_conf”, “organization_id”, “no_driver_daemon”, “user_id”]
  createResult [“clusterName”, “clusterState”, “clusterId”, “clusterWorkers”, “clusterOwnerUserId”]
  delete [“cluster_id”]
  deleteResult [“clusterWorkers”, “clusterState”, “clusterId”, “clusterOwnerUserId”, “clusterName”]
  edit [“spark_env_vars”, “no_driver_daemon”, “enable_elastic_disk”, “aws_attributes”, “driver_node_type_id”, “custom_tags”, “cluster_name”, “spark_conf”, “ssh_public_keys”, “autotermination_minutes”, “cluster_source”, “docker_image”, “enable_local_disk_encryption”, “cluster_id”, “spark_version”, “autoscale”, “cluster_log_conf”, “instance_pool_id”, “num_workers”, “init_scripts”, “node_type_id”]
  permanentDelete [“cluster_id”]
  resize [“cluster_id”, “num_workers”, “autoscale”]
  resizeResult [“clusterWorkers”, “clusterState”, “clusterId”, “clusterOwnerUserId”, “clusterName”]
  restart [“cluster_id”]
  restartResult [“clusterId”, “clusterState”, “clusterName”, “clusterOwnerUserId”, “clusterWorkers”]
  start [“init_scripts_safe_mode”, “cluster_id”]
  startResult [“clusterName”, “clusterState”, “clusterWorkers”, “clusterOwnerUserId”, “clusterId”]
dbfs addBlock [“handle”, “data_length”]
  create [“path”, “bufferSize”, “overwrite”]
  delete [“recursive”, “path”]
  getSessionCredentials [“mountPoint”]
  mkdirs [“path”]
  mount [“mountPoint”, “owner”]
  move [“dst”, “source_path”, “src”, “destination_path”]
  put [“path”, “overwrite”]
  unmount [“mountPoint”]
genie databricksAccess [“duration”, “approver”, “reason”, “authType”, “user”]
globalInitScripts create [“name”, “position”, “script-SHA256”, “enabled”]
  update [“script_id”, “name”, “position”, “script-SHA256”, “enabled”]
  delete [“script_id”]
groups addPrincipalToGroup [“user_name”, “parent_name”]
  createGroup [“group_name”]
  getGroupMembers [“group_name”]
  removeGroup [“group_name”]
iamRole changeIamRoleAcl [“targetUserId”, “shardName”, “resourceId”, “aclPermissionSet”]
instancePools changeInstancePoolAcl [“shardName”, “resourceId”, “targetUserId”, “aclPermissionSet”]
  create [“enable_elastic_disk”, “preloaded_spark_versions”, “idle_instance_autotermination_minutes”, “instance_pool_name”, “node_type_id”, “custom_tags”, “max_capacity”, “min_idle_instances”, “aws_attributes”]
  delete [“instance_pool_id”]
  edit [“instance_pool_name”, “idle_instance_autotermination_minutes”, “min_idle_instances”, “preloaded_spark_versions”, “max_capacity”, “enable_elastic_disk”, “node_type_id”, “instance_pool_id”, “aws_attributes”]
jobs cancel [“run_id”]
  cancelAllRuns [“job_id”]
  changeJobAcl [“shardName”, “aclPermissionSet”, “resourceId”, “targetUserId”]
  create [“spark_jar_task”, “email_notifications”, “notebook_task”, “spark_submit_task”, “timeout_seconds”, “libraries”, “name”, “spark_python_task”, “job_type”, “new_cluster”, “existing_cluster_id”, “max_retries”, “schedule”]
  delete [“job_id”]
  deleteRun [“run_id”]
  reset [“job_id”, “new_settings”]
  resetJobAcl [“grants”, “job_id”]
  runFailed [“jobClusterType”, “jobTriggerType”, “jobId”, “jobTaskType”, “runId”, “jobTerminalState”, “idInJob”, “orgId”]
  runNow [“notebook_params”, “job_id”, “jar_params”, “workflow_context”]
  runSucceeded [“idInJob”, “jobId”, “jobTriggerType”, “orgId”, “runId”, “jobClusterType”, “jobTaskType”, “jobTerminalState”]
  submitRun [“shell_command_task”, “run_name”, “spark_python_task”, “existing_cluster_id”, “notebook_task”, “timeout_seconds”, “libraries”, “new_cluster”, “spark_jar_task”]
  update [“fields_to_remove”, “job_id”, “new_settings”]
mlflowExperiment deleteMlflowExperiment [“experimentId”, “path”, “experimentName”]
  moveMlflowExperiment [“newPath”, “experimentId”, “oldPath”]
  restoreMlflowExperiment [“experimentId”, “path”, “experimentName”]
mlflowModelRegistry listModelArtifacts [“name”, “version”, “path”, “page_token”]
  getModelVersionSignedDownloadUri [“name”, “version”, “path”]
  createRegisteredModel [“name”, “tags”]
  deleteRegisteredModel [“name”]
  renameRegisteredModel [“name”, “new_name”]
  setRegisteredModelTag [“name”, “key”, “value”]
  deleteRegisteredModelTag [“name”, “key”]
  createModelVersion [“name”, “source”, “run_id”, “tags”, “run_link”]
  deleteModelVersion [“name”, “version”]
  getModelVersionDownloadUri [“name”, “version”]
  setModelVersionTag [“name”, “version”, “key”, “value”]
  deleteModelVersionTag [“name”, “version”, “key”]
  createTransitionRequest [“name”, “version”, “stage”]
  deleteTransitionRequest [“name”, “version”, “stage”, “creator”]
  approveTransitionRequest [“name”, “version”, “stage”, “archive_existing_versions”]
  rejectTransitionRequest [“name”, “version”, “stage”]
  transitionModelVersionStage [“name”, “version”, “stage”, “archive_existing_versions”]
  transitionModelVersionStageDatabricks [“name”, “version”, “stage”, “archive_existing_versions”]
  createComment [“name”, “version”]
  updateComment [“id”]
  deleteComment [“id”]
notebook attachNotebook [“path”, “clusterId”, “notebookId”]
  createNotebook [“notebookId”, “path”]
  deleteFolder [“path”]
  deleteNotebook [“notebookId”, “notebookName”, “path”]
  detachNotebook [“notebookId”, “clusterId”, “path”]
  downloadLargeResults [“notebookId”, “notebookFullPath”]
  downloadPreviewResults [“notebookId”, “notebookFullPath”]
  importNotebook [“path”]
  moveNotebook [“newPath”, “oldPath”, “notebookId”]
  renameNotebook [“newName”, “oldName”, “parentPath”, “notebookId”]
  restoreFolder [“path”]
  restoreNotebook [“path”, “notebookId”, “notebookName”]
  takeNotebookSnapshot [“path”]
repos createRepo [“url”, “provider”, “path”]
  updateRepo [“id”, “branch”, “tag”, “git_url”, “git_provider”]
  getRepo [“id”]
  listRepos [“path_prefix”, “next_page_token”]
  deleteRepo [“id”]
  pull [“id”]
  commitAndPush [“id”, “message”, “files”, “checkSensitiveToken”]
  checkoutBranch [“id”, “branch”]
  discard [“id”, “file_paths”]
secrets createScope [“scope”]
  deleteScope [“scope”]
  deleteSecret [“key”, “scope”]
  getSecret [“scope”, “key”]
  listAcls [“scope”]
  listSecrets [“scope”]
  putSecret [“string_value”, “scope”, “key”]
sqlanalytics createEndpoint  
  startEndpoint  
  stopEndpoint  
  deleteEndpoint  
  editEndpoint  
  changeEndpointAcls  
  setEndpointConfig  
  createQuery [“queryId”]
  updateQuery [“queryId”]
  forkQuery [“queryId”, “originalQueryId”]
  moveQueryToTrash [“queryId”]
  deleteQuery [“queryId”]
  restoreQuery [“queryId”]
  createDashboard [“dashboardId”]
  updateDashboard [“dashboardId”]
  moveDashboardToTrash [“dashboardId”]
  deleteDashboard [“dashboardId”]
  restoreDashboard [“dashboardId”]
  createAlert [“alertId”, “queryId”]
  updateAlert [“alertId”, “queryId”]
  deleteAlert [“alertId”]
  createVisualization [“visualizationId”, “queryId”]
  updateVisualization [“visualizationId”]
  deleteVisualization [“visualizationId”]
  changePermissions [“objectType”, “objectId”, “granteeAndPermission”]
  createExternalDatasource [“dataSourceId”, “dataSourceType”]
  updateExternalDatasource [“dataSourceId”]
  deleteExternalDatasource [“dataSourceId”]
  createAlertDestination [“alertDestinationId”, “alertDestinationType”]
  updateAlertDestination [“alertDestinationId”]
  deleteAlertDestination [“alertDestinationId”]
  createQuerySnippet [“querySnippetId”]
  updateQuerySnippet [“querySnippetId”]
  deleteQuerySnippet [“querySnippetId”]
  downloadQueryResult [“queryId”, “queryResultId”, “fileType”]
sqlPermissions createSecurable [“securable”]
  grantPermission [“permission”]
  removeAllPermissions [“securable”]
  requestPermissions [“requests”]
  revokePermission [“permission”]
  showPermissions [“securable”, “principal”]
ssh login [“containerId”, “userName”, “port”, “publicKey”, “instanceId”]
  logout [“userName”, “containerId”, “instanceId”]
workspace changeWorkspaceAcl [“shardName”, “targetUserId”, “aclPermissionSet”, “resourceId”]
  fileCreate [“path”]
  fileDelete [“path”]
  moveWorkspaceNode [“destinationPath”, “path”]
  purgeWorkspaceNodes [“treestoreId”]
  workspaceConfEdit [“workspaceConfKeys (values: enableResultsDownloading, enableExportNotebook)”, “workspaceConfValues”]
  workspaceExport [“workspaceExportFormat”, “notebookFullPath”]

Account level audit log events

Service Action Request Parameters
accountBillableUsage getAggregatedUsage [“account_id”, “window_size”, “start_time”, “end_time”, “meter_name”, “workspace_ids_filter”]
  getDetailedUsage [“account_id”, “start_month”, “end_month”, “with_pii”]
accounts login [“user”]
  gcpWorkspaceBrowserLogin [“user”]
  logout [“user”]
accountsManager updateAccount [“account_id”, “account”]
  changeAccountOwner [“account_id”, “first_name”, “last_name”, “email”]
  updateSubscription [“account_id”, “subscription_id”, “subscription”]
  listSubscriptions [“account_id”]
  createWorkspaceConfiguration [“workspace”]
  getWorkspaceConfiguration [“account_id”, “workspace_id”]
  listWorkspaceConfigurations [“account_id”]
  updateWorkspaceConfiguration [“account_id”, “workspace_id”]
  deleteWorkspaceConfiguration [“account_id”, “workspace_id”]
  listWorkspaceEncryptionKeyRecords [“account_id”, “workspace_id”]
  listWorkspaceEncryptionKeyRecordsForAccount [“account_id”]
  createVpcEndpoint [“vpc_endpoint”]
  getVpcEndpoint [“account_id”, “vpc_endpoint_id”]
  listVpcEndpoints [“account_id”]
  deleteVpcEndpoint [“account_id”, “vpc_endpoint_id”]
  createPrivateAccessSettings [“private_access_settings”]
  getPrivateAccessSettings [“account_id”, “private_access_settings_id”]
  listPrivateAccessSettingss [“account_id”]
  deletePrivateAccessSettings [“account_id”, “private_access_settings_id”]
logDelivery createLogDeliveryConfiguration [“account_id”, “config_id”]
  updateLogDeliveryConfiguration [“config_id”, “account_id”, “status”]
  getLogDeliveryConfiguration [“log_delivery_configuration”]
  listLogDeliveryConfigurations [“account_id”, “storage_configuration_id”, “credentials_id”, “status”]
ssoConfigBackend create [“account_id”, “sso_type”, “config”]
  update [“account_id”, “sso_type”, “config”]
  get [“account_id”, “sso_type”]

Analyze audit logs

You can analyze audit logs using Databricks. The following example uses logs to report on Databricks access and Apache Spark versions.

Load audit logs as a DataFrame and register the DataFrame as a temp table.

val df = spark.read.json("gs://bucketName/path/to/your/audit-logs")
df.createOrReplaceTempView("audit_logs")

List the users who accessed Databricks and from where.

%sql
SELECT DISTINCT userIdentity.email, sourceIPAddress
FROM audit_logs
WHERE serviceName = "accounts" AND actionName LIKE "%login%"

Check the Apache Spark versions used.

%sql
SELECT requestParams.spark_version, COUNT(*)
FROM audit_logs
WHERE serviceName = "clusters" AND actionName = "create"
GROUP BY requestParams.spark_version

Check table data access.

%sql
SELECT *
FROM audit_logs
WHERE serviceName = "sqlPermissions" AND actionName = "requestPermissions"