Monitor costs using system tables

This article explains how you can use the system.billing.usage table on its own or joined with other system tables to get a picture of your account’s Databricks usage. The following feature-specific articles are also available:

How to read the usage table

Users with permissions to access system table data can view and query their account’s billing logs, located at system.billing.usage. Every billing record includes columns that attribute the usage amount to the specific resources, identities, and products involved.

  • The usage_metadata column includes a struct with information about the resources or objects involved in the usage.

  • The identity_metadata column includes information about the user or service principal that incurred the usage.

  • The custom_tags column includes tags applied to the compute resource associated with the usage. This also includes tags added by budget policies so you can attribute serverless usage.

  • The billing_origin_product and product_features columns give you information about the exact product and features used.

For a complete reference of the usage table, see Billable usage system table reference.

Operationalize billing data

Databricks recommends using AI/BI dashboards to create cost monitoring dashboards using system table billing data. You can create a new dashboard, or account admins can import pre-built, customizable cost monitoring dashboard. See Usage dashboards.

You can also add alerts to your queries to help you stay informed about usage data. See Create an alert.

Sample queries

The following queries provide examples of how you can use the system.billing.usage table data to gain insights into your account’s usage.

How many DBUs of each product have been used throughout this month?

SELECT
    billing_origin_product,
    usage_date,
    sum(usage_quantity) as usage_quantity
FROM system.billing.usage
WHERE
    month(usage_date) = month(NOW())
    AND year(usage_date) = year(NOW())
GROUP BY billing_origin_product, usage_date

Which jobs consumed the most DBUs?

SELECT
  usage_metadata.job_id as `Job ID`, sum(usage_quantity) as `Usage`
FROM
  system.billing.usage
WHERE
  usage_metadata.job_id IS NOT NULL
GROUP BY
  `Job ID`
ORDER BY
  `Usage` DESC

How much usage can be attributed to resources with a specific tag?

You can break down costs in various ways. This example shows you how to break down costs by a custom tag. Be sure to replace the custom tag’s key and value in the query.

SELECT
  sku_name, usage_unit, SUM(usage_quantity) as `Usage`
FROM
  system.billing.usage
WHERE
  custom_tags [:key] = :value
GROUP BY 1, 2

Show me the products where usage is growing

SELECT
  after.billing_origin_product, before_dbus, after_dbus, ((after_dbus - before_dbus)/before_dbus * 100) AS growth_rate
FROM
  (SELECT
     billing_origin_product, sum(usage_quantity) as before_dbus
   FROM
     system.billing.usage
   WHERE
     usage_date BETWEEN "2024-04-01" and "2024-04-30"
   GROUP BY
     billing_origin_product
  ) as before
JOIN
  (SELECT
     billing_origin_product, sum(usage_quantity) as after_dbus
   FROM
     system.billing.usage
   WHERE
     usage_date
   BETWEEN
     "2024-05-01" and "2024-05-30"
   GROUP BY
     billing_origin_product
  ) as after
WHERE
  before.billing_origin_product = after.billing_origin_product
SORT BY
  growth_rate DESC

What is the usage trend of All Purpose Compute (Photon)?

SELECT
  sku_name,
  usage_date,
  sum(usage_quantity) as `DBUs consumed`
FROM
  system.billing.usage
WHERE
  year(usage_date) = year(CURRENT_DATE)
AND
  sku_name = "ENTERPRISE_ALL_PURPOSE_COMPUTE_(PHOTON)"
AND
  usage_date > "2024-04-15"
GROUP BY
  sku_name, usage_date

What is the DBU consumption of a materialized view or streaming table?

To get the DBU usage and SKU for a specific materialized view or streaming table, submit a query to the billable usage system table for records where usage_metadata.dlt_pipeline_id is set to the ID of the pipeline associated with the materialized view or streaming table. You can find the pipeline ID in the Details tab in Catalog Explorer when viewing the materialized view or streaming table. To limit the consumption by date, specify a start date, end date, or a date range. The following query retrieves the DBU usage for the pipeline with ID 00732f83-cd59-4c76-ac0d-57958532ab5b and a usage start date of 2024-05-30:

SELECT
  sku_name,
  usage_date,
  SUM(usage_quantity) AS `DBUs`
FROM
  system.billing.usage
WHERE
  usage_metadata.dlt_pipeline_id = :dlt_pipeline_id
  AND usage_start_time > :usage_start_time
GROUP BY
  ALL

What is the DBU consumption of a serverless DLT pipeline?

To get the DBU usage and SKU for a serverless DLT pipeline, submit a query to the billable usage system table for records where usage_metadata.dlt_pipeline_id is set to the ID of the pipeline. You can find the pipeline ID on the Pipeline Details tab when viewing a pipeline in the Delta Live Tables UI. To limit the consumption by date, specify a start date, end date, or a date range. The following query retrieves the DBU usage from December 2024 for the pipeline with ID 00732f83-cd59-4c76-ac0d-57958532ab5b.

SELECT
  sku_name,
  usage_date,
  SUM(usage_quantity) AS `DBUs`
FROM
  system.billing.usage
WHERE
  usage_metadata.dlt_pipeline_id = :dlt_pipeline_id
  AND usage_start_time >= :usage_start_time
  AND usage_end_time < :usage_end_time
GROUP BY
  ALL

What is the daily trend in DBU consumption?

SELECT
  usage_date as `Date`, sum(usage_quantity) as `DBUs Consumed`
FROM
  system.billing.usage
WHERE
  sku_name = "STANDARD_ALL_PURPOSE_COMPUTE"
GROUP BY
  usage_date
ORDER BY
  usage_date ASC

Examples of usage table joins

The following examples show you ways you can join the usage table with other system tables to gain additional insights.

Attribute costs to the compute owner

If you are looking to reduce compute costs, you can use this query to find out which cluster owners in your account are using the most DBUs.

SELECT
  u.record_id record_id,
  c.cluster_id cluster_id,
  max_by(c.owned_by, c.change_time) owned_by,
  max(c.change_time) change_time,
  any_value(u.usage_start_time) usage_start_time,
  any_value(u.usage_quantity) usage_quantity
FROM
  system.billing.usage u
  JOIN system.compute.clusters c
WHERE
  u.usage_metadata.cluster_id is not null
  and u.usage_start_time >= '2023-01-01'
  and u.usage_metadata.cluster_id = c.cluster_id
  and c.change_time <= u.usage_start_time
GROUP BY 1, 2
ORDER BY cluster_id, usage_start_time desc;

Enrich usage with a job name

with jobs as (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn
  FROM system.lakeflow.jobs QUALIFY rn=1
)
SELECT
  usage.*,
  coalesce(usage_metadata.job_name, jobs.name) as job_name
FROM system.billing.usage
  LEFT JOIN jobs ON usage.workspace_id=jobs.workspace_id AND usage.usage_metadata.job_id=jobs.job_id
WHERE
  billing_origin_product="JOBS"

Join the pricing with usage tables

The list_prices table includes list prices over time for each available SKU. You can join the usage table to view the listing cost of certain usage.

For example, the following query returns the total cost attributed to a particular tag during a month:

SELECT
 SUM(usage.usage_quantity * list_prices.pricing.effective_list.default)
 as `Total Dollar Cost`
FROM system.billing.usage
JOIN system.billing.list_prices ON list_prices.sku_name = usage.sku_name
WHERE usage.custom_tags [:key] = :value
AND usage.usage_end_time >= list_prices.price_start_time
AND (list_prices.price_end_time IS NULL OR usage.usage_end_time < list_prices.price_end_time)
AND usage.usage_date BETWEEN "2024-05-01" AND "2024-05-31"

Estimate the add-on costs for usage in the previous calendar month

This query applies a simple percentage to all usage in the period. Note that this might differ slightly from the actual monetization due to how entitlements for some add-ons are managed. Replace the add-on rate with your account’s rate.

SELECT SUM(usage.usage_quantity * list_prices.pricing.effective_list.default) * :add_on_rate as `Total Add-On Dollar Cost`
FROM system.billing.usage
JOIN system.billing.list_prices ON list_prices.sku_name = usage.sku_name
  WHERE usage.usage_end_time >= list_prices.price_start_time
  AND (list_prices.price_end_time IS NULL OR usage.usage_end_time < list_prices.price_end_time)
  AND usage.usage_date BETWEEN "2024-02-01" AND "2024-02-29"