Monitor usage using tags
To monitor cost and accurately attribute Databricks usage to your organization’s business units and teams (for chargebacks, for example), you can add custom tags to workspaces and compute resources. Databricks recommends using system tables (Public Preview) to view usage data. See Billable usage system table reference. Note: Tag data may be replicated globally. As such, do not use tag names or values that could compromise the security of your resources. For example, do not use tag names that contain personal or sensitive information.
The Databricks billable usage graphs in the account console can aggregate usage by individual tags. The billable usage CSV reports downloaded from the same page also include default and custom tags. Tags also propagate to GKE and GCE labels.
Tagged objects and resources
You can add custom tags for the following objects managed by Databricks:
Object |
Tagging interface (UI) |
Tagging interface (API) |
---|---|---|
Pool |
Pools UI in the Databricks workspace |
|
All-purpose and job compute |
Compute UI in the Databricks workspace |
|
SQL warehouse |
SQL warehouse UI in the Databricks workspace |
Warning
Do not assign a custom tag with the key Name
to a cluster. Every cluster has a tag Name
whose value is set by Databricks. If you change the value associated with the key Name
, the cluster can no longer be tracked by Databricks. As a consequence, the cluster might not be terminated after becoming idle and will continue to incur usage costs.
Default tags
Databricks adds the following default tags to all-purpose compute:
Tag key |
Value |
---|---|
|
Constant value: |
|
Databricks internal ID of the cluster |
|
Name of the cluster |
|
Username (email address) of the user who created the cluster |
On job clusters, Databricks also applies the following default tags:
Tag key |
Value |
---|---|
|
Job name |
|
Job ID |
Databricks adds the following default tags to all pools:
Tag key |
Value |
---|---|
|
Constant value: |
|
Databricks internal ID of the user who created the pool |
|
Databricks internal ID of the pool |
Tag propagation
Cluster and pool tags propagate in three different ways that you can use for aggregating costs:
Tags in DBU reports: Tags propagate to the billable usage system table logs and DBU usage reports in the downloaded reports.
GKE (Kubernetes) labels for each pod: Tags propagate to labels on GKE pods. This allows you to use GKE usage metering to attribute costs for all Databricks compute resources.
GCE labels for each VM and its persistent disks: Tags propagate to labels on GCE resources such as VM and its persistent disks. This allows you to use GCE usage metering to attribute costs, which is more accurate than GKE labels for aggregation of Google Cloud costs for all Databricks compute resources. The tag’s keys and values are transformed to conform with GCE label format limits.
How tags propagate for clusters created from pools
Tags propagate to node instances differently depending on whether or not a cluster was created from a pool.
If a cluster is not created from a pool, its tags propagate as expected to node instances.
If a cluster is created from a pool, its instances inherit both the pool tags and the cluster tags. The pool’s tags are used directly for VM usage data only for the idle VMs.
If there is a tag name conflict, Databricks default tags take precedence over custom tags, and pool tags take precedence over cluster tags.
Limitations
Tag keys and values can only contain letters, spaces, numbers, or the characters
+
,-
,=
,.
,_
,:
,/
,@
. Tags containing other characters are invalid.If you change tag keys names or values, these changes apply only after cluster restart or pool expansion.
The maximum custom number of tags that can propagate to GCE labels is 54.
The maximum length for GCE label keys and values is 63 characters.
Label propagation can be delayed due to GCE API rate limits for the project. You can resolve this by increasing the GCE API rate limits for the Google Cloud project.
Tag enforcement with policies
You can enforce tags on clusters using compute policies. For more information, see Custom tag enforcement.
GCE label limits
GKE labels can directly use the Databricks tag keys and values.
For GCE labels, there are limitations:
Keys and values must consist only of lower case letters, numeric characters, underscores, and dashes.
Maximum length for GCE label keys and values is 63 characters.
Maximum number of tags that can propagate to GCE labels is 54.
To conform to GCE format rules, tags are transformed before becoming GCE label keys and values. If there are duplicates after transformation, the pair of keys and values that appear later (lower) in tag definitions are the ones that persist.
The following table compares GKE and GCE default cluster tags.
GKE label key |
GKE label value |
GCE label key |
GCE label value |
---|---|---|---|
|
|
|
|
|
Databricks ID |
|
Databricks ID |
|
Customer-defined name |
|
Customer-defined name in lower case. Characters are removed if they are not letters, numbers, underscores or dashes. For example, |
|
Creator user’s email address with |
|
Creator user’s email address with |
The following table compares GKE and GCE default instance pool tags:
GKE label key |
GKE label value |
GCE label key |
GCE label value |
---|---|---|---|
|
Databricks ID |
|
Databricks ID |
|
Databricks ID |
|
Databricks ID |
The following table compares GKE and GCE for all other tags (custom tags):
GKE label key |
GKE label value |
GCE label key |
GCE label value |
---|---|---|---|
Customer-defined key |
Customer-defined value |
Customer-defined key in lower case. Characters are removed if they are not letters, numbers, underscores or dashes. For example, |
Customer-defined value in lower case. Characters are removed if they are not letters, numbers, underscores or dashes. For example, |