Analyze billable usage log data
This article explains how to read and analyze the usage data from your Databricks account.
You can view and download billable usage directly in the account console.
CSV file schema
Column |
Type |
Description |
Example |
---|---|---|---|
workspaceId |
string |
ID of the workspace. |
|
timestamp |
datetime |
End of the hour for the provided usage. |
|
clusterId |
string |
ID of the cluster (for a cluster) or of the warehouse (for a SQL warehouse) |
Cluster example: SQL warehouse example: |
clusterName |
string |
User-provided name for the cluster/warehouse. |
|
clusterNodeType |
string |
Instance type of the cluster/warehouse. |
Cluster example: SQL warehouse example: |
clusterOwnerUserId |
string |
ID of the user who created the cluster/warehouse. |
|
clusterCustomTags |
string (“-escaped json) |
Custom tags associated with the cluster/warehouse during this hour. |
|
sku |
string |
Billing SKU. See the Billing SKUs table for a list of values. |
|
dbus |
double |
Number of DBUs used by the user during this hour. |
|
machineHours |
double |
Total number of machine hours used by all containers in the cluster/warehouse. |
|
clusterOwnerUserName |
string |
Username (email) of the user who created the cluster/warehouse. |
|
tags |
string (“-escaped json) |
Default and custom cluster/warehouse tags, and default and custom instance pool
tags (if applicable) associated with the cluster during this hour. See
Cluster tags,
Warehouse tags,
and Pool tags.
This is a superset of the |
|
Billing SKUs
ENTERPRISE_ALL_PURPOSE_COMPUTE
ENTERPRISE_ALL_PURPOSE_COMPUTE_(PHOTON)
ENTERPRISE_JOBS_COMPUTE
ENTERPRISE_JOBS_COMPUTE_(PHOTON)
ENTERPRISE_JOBS_LIGHT_COMPUTE
ENTERPRISE_SQL_COMPUTE
ENTERPRISE_DLT_CORE_COMPUTE
ENTERPRISE_DLT_CORE_COMPUTE_(PHOTON)
ENTERPRISE_DLT_PRO_COMPUTE
ENTERPRISE_DLT_PRO_COMPUTE_(PHOTON)
ENTERPRISE_DLT_ADVANCED_COMPUTE
ENTERPRISE_DLT_ADVANCED_COMPUTE_(PHOTON)
PREMIUM_ALL_PURPOSE_COMPUTE
PREMIUM_ALL_PURPOSE_COMPUTE_(PHOTON)
PREMIUM_JOBS_COMPUTE
PREMIUM_JOBS_COMPUTE_(PHOTON)
PREMIUM_JOBS_LIGHT_COMPUTE
PREMIUM_SQL_COMPUTE
PREMIUM_DLT_CORE_COMPUTE
PREMIUM_DLT_CORE_COMPUTE_(PHOTON)
PREMIUM_DLT_PRO_COMPUTE
PREMIUM_DLT_PRO_COMPUTE_(PHOTON)
PREMIUM_DLT_ADVANCED_COMPUTE
PREMIUM_DLT_ADVANCED_COMPUTE_(PHOTON)
STANDARD_ALL_PURPOSE_COMPUTE
STANDARD_ALL_PURPOSE_COMPUTE_(PHOTON)
STANDARD_JOBS_COMPUTE
STANDARD_JOBS_COMPUTE_(PHOTON)
STANDARD_JOBS_LIGHT_COMPUTE
STANDARD_DLT_CORE_COMPUTE
STANDARD_DLT_CORE_COMPUTE_(PHOTON)
STANDARD_DLT_PRO_COMPUTE
STANDARD_DLT_PRO_COMPUTE_(PHOTON)
STANDARD_DLT_ADVANCED_COMPUTE
STANDARD_DLT_ADVANCED_COMPUTE_(PHOTON)
Analyze usage data in Databricks
This section describes how to make the data in the billable usage CSV file available to Databricks for analysis. It describes options for creating a usage table and includes a sample notebook that you can use to run a usage analysis dashboard.
The CSV file uses a format that is standard for commercial spreadsheet applications but requires a modification to be read by Apache Spark. You must use option("escape", "\"")
when you create the usage table in Databricks.
Total DBUs are the sum of the dbus
column.
Import the log using the Create Table UI
You can use the Load data using the add data UI to import the CSV file into Databricks for analysis.
Create a Spark DataFrame
You can also use the following code to create the usage table from a path to the CSV file:
df = (spark.
read.
option("header", "true").
option("inferSchema", "true").
option("escape", "\"").
csv("/FileStore/tables/usage_data.csv"))
df.createOrReplaceTempView("usage")
If the file is stored in an S3 bucket, for example when it is used with log delivery, the code will look like the following. You can specify a file path or a directory. If you pass a directory, all files are imported. The following example specifies a file.
df = (spark.
read.
option("header", "true").
option("inferSchema", "true").
option("escape", "\"").
load("s3://<bucketname>/<pathprefix>/billable-usage/csv/workspaceId=<workspace-id>-usageMonth=<month>.csv"))
df.createOrReplaceTempView("usage")
The following example imports a directory of billable usage files:
df = (spark.
read.
option("header", "true").
option("inferSchema", "true").
option("escape", "\"").
load("s3://<bucketname>/<pathprefix>/billable-usage/csv/"))
df.createOrReplaceTempView("usage")
Create a Delta table
To create a Delta table from the DataFrame (df
) in the previous example, use the following code:
(df.write
.format("delta")
.mode("overwrite")
.saveAsTable("database_name.table_name")
)
Warning
The saved Delta table is not updated automatically when you add or replace new CSV files. If you need the latest data, re-run these commands before you use the Delta table.
Usage analysis dashboard notebook
If you use billable usage delivery, you can use the following notebook to run a usage analysis dashboard by providing a path to the S3 bucket where your CSV files are stored and entering report parameters in a widget.
The widget that you use to enter report parameters appears above the first notebook cell when you import the notebook to your Databricks workspace. The widget does not appear in the browser-only view of the notebook. Here’s an image of the widget:
