Best practices for cost optimization

This article covers best practices supporting principles of cost optimization, organized by principle.

1. Choose optimal resources

Use performance optimized data formats

To get the most out of the Databricks Data Intelligence Platform, you must use Delta Lake as your storage framework. It helps build simpler and more reliable ETL pipelines, and comes with many performance enhancements that can significantly speed up workloads compared to using Parquet, ORC, and JSON. See Optimization recommendations on Databricks. If the workload is also running on a job compute, this directly translates into shorter uptime of compute resources leading to lower costs.

Use job compute

A job is a way to run non-interactive code on a Databricks compute instance. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. Of course, you can also run jobs interactively in the notebook UI. However, on job compute, the non-interactive workloads will cost significantly less than on all-purpose compute. See the pricing overview to compare Jobs Compute and All-Purpose Compute.

An additional benefit for some jobs is that each job or workflow can run on a new compute instance, isolating workloads from each other. However, multitask workflows can also reuse compute resources for all tasks, so the compute startup time occurs only once per workflow. See Configure compute for jobs.

Use SQL warehouse for SQL workloads

For interactive SQL workloads, a Databricks SQL warehouse is the most cost-efficient engine. See the pricing overview. All SQL warehouses come with Photon by default, which accelerates your existing SQL and DataFrame API calls and reduces your overall cost per workload.

In addition, serverless SQL warehouses support intelligent workload management (IWM), a set of features that enhances Databricks SQL serverless ability to process large numbers of queries quickly and cost-effectively.

Use up-to-date runtimes for your workloads

The Databricks platform provides different runtimes that are optimized for data engineering tasks (Databricks Runtime) or machine learning tasks (Databricks Runtime for Machine Learning). The runtimes are built to provide the best selection of libraries for the tasks, and to ensure that all libraries provided are up-to-date and work together optimally. The Databricks Runtimes are released on a regular cadence, providing performance improvements between major releases. These performance improvements often result in cost savings due to more efficient use of compute resources.

Only use GPUs for the right workloads

Virtual machines with GPUs can dramatically speed up computations for deep learning, but are significantly more expensive than CPU-only machines. Use GPU instances only for workloads that have GPU-accelerated libraries.

Most workloads do not use GPU-accelerated libraries, so they do not benefit from GPU-enabled instances. Workspace administrators can restrict GPU machines and compute resources to prevent unnecessary usage. See the blog post “Are GPUs Really Expensive? Benchmarking GPUs for Inference on Databricks Clusters”.

Use serverless services for your workloads

BI workloads typically consume data in bursts and generate multiple concurrent queries. For example, someone using a BI tool might update a dashboard or write a query and then simply analyze the results without further interaction with the platform. In this scenario the data platform:

  • Terminates idle compute resources to save costs.

  • Quickly provides the compute resources when the user requests new or updated data with the BI tool.

Non-serverless Databricks SQL warehouses have a startup time of minutes, so many users tend to accept the higher cost and do not terminate them during idle periods. On the other hand, serverless SQL warehouses start and scale up in seconds, so both instant availability and idle termination can be achieved. This results in a great user experience and overall cost savings.

Additionally, serverless SQL warehouses scale down earlier than non-serverless warehouses, again, resulting in lower costs.

ML and AI model serving

Most models are served as a REST API for integration into your web or client application; the model serving service receives varying loads of requests over time, and a model serving platform should always provide sufficient resources, but only as many as are actually needed (upscaling and downscaling).

Mosaic AI Model Serving uses serverless compute and provides a highly available and low latency service for deploying models. The service automatically scales up or down to meet changes in demand, reducing infrastructure costs while optimizing latency performance.

Use the right instance type

Using the latest generation of cloud instance types almost always provides performance benefits, as they offer the best performance and the latest features.

Based on your workloads, it is also important to choose the right instance family to get the best performance/price ratio. Some simple rules of thumb are:

  • Memory optimized for ML, heavy shuffle and spill workloads

  • Compute optimized for structured streaming workloads and maintenance jobs (such as optimize and vacuum)

  • Storage optimized for workloads that benefit from caching, such as ad-hoc and interactive data analysis

  • GPU optimized for specific ML and DL workloads

  • General purpose in the absence of specific requirements

Choose the most efficient compute size

Databricks runs one executor per worker node. Therefore, the terms executor and worker are used interchangeably in the context of the Databricks architecture. People often think of cluster size in terms of the number of workers, but there are other important factors to consider:

  • Total executor cores (compute): The total number of cores across all executors. This determines the maximum parallelism of a compute instance.

  • Total executor memory: The total amount of RAM across all executors. This determines how much data can be stored in memory before spilling it to disk.

  • Executor local storage: The type and amount of local disk storage. Local disk is primarily used in the case of spills during shuffles and caching.

Additional considerations include worker instance type and size, which also influence the preceding factors. When sizing your compute, consider the following:

  • How much data will your workload consume?

  • What’s the computational complexity of your workload?

  • Where are you reading data from?

  • How is the data partitioned in external storage?

  • How much parallelism do you need?

Details and examples can be found under Compute sizing considerations.

Evaluate performance-optimized query engines

Photon is a high-performance Databricks-native vectorized query engine that speeds up your SQL workloads and DataFrame API calls (for data ingestion, ETL, streaming, data science, and interactive queries). Photon is compatible with Apache Spark APIs, so getting started is as easy as turning it on – no code changes and no lock-in.

The observed speedup can lead to significant cost savings, and jobs that run regularly should be evaluated to see whether they are not only faster but also cheaper with Photon.

2. Dynamically allocate resources

Use auto-scaling compute

With autoscaling, Databricks dynamically reallocates workers to account for the characteristics of your job. Certain parts of your pipeline may be more computationally intensive than others, and Databricks automatically adds additional workers during those phases of your job (and removes them when they’re no longer needed). Autoscaling can reduce overall costs compared to a statically sized compute instance.

Compute auto-scaling has limitations when scaling down cluster size for structured streaming workloads. Databricks recommends using Delta Live Tables with enhanced autoscaling for streaming workloads.

Use auto termination

Databricks provides several features to help control costs by reducing idle resources and controlling when compute resources can be deployed.

  • Configure auto termination for all interactive compute resources. After a specified idle time, the compute resource shuts down. See Automatic termination.

  • For use cases where compute is needed only during business hours, compute resources can be configured with auto termination, and a scheduled process can restart compute (and possibly prewarm data if necessary) in the morning before users are back at their desktops. See CACHE SELECT.

  • If compute startup times are too long, consider using cluster pools, see Pool best practices. Databricks pools are a set of idle, ready-to-use instances. When cluster nodes are created using the idle instances, cluster start and auto-scaling times are reduced. If the pools have no idle instances, the pools expand by allocating a new instance from the instance provider in order to accommodate the cluster’s request.

Databricks does not charge Databricks Units (DBUs) while instances are idle in the pool, resulting in cost savings. Instance provider billing does apply.

Use compute policies to control costs

Compute policies can enforce many cost-specific restrictions for compute resources. See Operational Excellence - Use compute policies. For example:

3. Monitor and control cost

Monitor costs

The Account console provides a view on the billable usage. See Monitor usage in the account console. You can also use the Account API to download the logs.

Tag clusters for cost attribution

To monitor costs in general and to accurately attribute Databricks usage to your organization’s business units and teams for chargeback purposes, you can tag clusters, SQL warehouses, and pools. These tags propagate to detailed Databricks Units (DBU) and cloud provider VM and blob storage usage for cost analysis.

Ensure that cost control and attribution are considered when setting up workspaces and clusters for teams and use cases. This streamlines tagging and improves the accuracy of cost attribution.

Total costs include the DBU virtual machine, disk, and any associated network costs. For serverless SQL warehouses, the DBU cost already includes the virtual machine and disk costs.

Implement observability to track and chargeback cost

When working with complex technical ecosystems, proactively understanding the unknowns is key to maintaining platform stability and controlling costs. Observability provides a way to analyze and optimize systems based on the data they generate. This is different from monitoring, which focuses on identifying new patterns rather than tracking known issues.

Databricks provides great observability capabilities using System tables that are Databricks-hosted analytical stores of a customer account’s operational data found in the system catalog. They provide historical observability across the account and include user-friendly tabular information on platform telemetry.

See Blog: Intelligently Balance Cost Optimization & Reliability on Databricks

Share cost reports regularly

Generate monthly cost reports to track consumption growth and anomalies. Share these reports by use case or team with the teams that own the workloads using cluster tagging. This eliminates surprises and allows teams to proactively adjust their workloads if costs become too high.

Monitor and manage Delta Sharing egress costs

Unlike other data sharing platforms, Delta Sharing does not require data replication. This model has many advantages, but it means that your cloud vendor may charge data egress fees when you share data across clouds or regions. See Monitor and manage Delta Sharing egress costs (for providers) to monitor and manage egress charges.

4. Design cost-effective workloads

Balance always-on and triggered streaming

Traditionally, when people think about streaming, terms such as “real-time”, “24/7,” or “always on” come to mind. If data ingestion happens in real-time, the underlying compute resources must run 24/7, incurring costs every single hour of the day.

However, not every use case that relies on a continuous stream of events requires those events to be immediately added to the analytics data set. If the business requirement for the use case only requires fresh data every few hours or every day, then that requirement can be met with only a few runs per day, resulting in a significant reduction in workload cost. Databricks recommends using Structured Streaming with the AvailableNow trigger for incremental workloads that do not have low latency requirements. See Configuring incremental batch processing.

Balance between on-demand and capacity excess instances

Preemptible instances take advantage of excess virtual machine resources in the cloud that are available at a lower price. To save costs, Databricks supports creating clusters using spot instances. It is recommended that the first instance (the Spark driver) should always be an on-demand virtual machine. Preemptible instances are a good choice for workloads where it is acceptable to take longer because one or more spot instances have been evicted by the cloud provider.