SQL warehouse sizing, scaling, and queuing behavior

This article explains the cluster sizing, queuing, and autoscaling behavior of SQL warehouses.

Sizing a serverless SQL warehouse

Always start with a larger t-shirt size for your serverless SQL warehouse than you think you will need and size down as you test. Don’t start with a small t-shirt size for your serverless SQL warehouse and go up. In general, start with a single serverless SQL warehouse and rely on Databricks to right-size with serverless clusters, prioritizing workloads, and fast data reads. See Serverless autoscaling and query queuing.

  • To decrease query latency for a given serverless SQL warehouse:

    • If queries are spilling to disk, increase the t-shirt size.

    • If the queries are highly parallelizable, increase the t-shirt size.

    • If you are running multiple queries at a time, add more clusters for autoscaling.

  • To reduce costs, try to step down in t-shirt size without spilling to disk or significantly increasing latency.

  • To help right-size your serverless SQL warehouse, use the following tools:

    • Monitoring page: look at the peak query count. If the peak queued is commonly above one, add clusters. The maximum number of queries in a queue for all SQL warehouse types is 1000. See Monitor a SQL warehouse.

    • Query history. See Query history.

    • Query profiles (look for Bytes spilled to disk above 1). See Query profile.

Note

For serverless SQL warehouses, the cluster sizes may in some cases use different instance types than the ones listed in the documentation for pro and classic SQL warehouses for an equivalent cluster size. In general, the price/performance ratio of the cluster sizes for serverless SQL warehouses is similar to those for pro and classic SQL warehouses.

Serverless autoscaling and query queuing

Intelligent Workload Management (IWM) is a set of features that enhances the ability of serverless SQL warehouses to process large numbers of queries quickly and cost-effectively. Using AI-powered prediction capabilities to analyze incoming queries and determine the fastest and more efficient (Predictive IO), IWM works to ensure that workloads have the right amount of resources quickly. The key difference lies in the AI capabilities in Databricks SQL to respond dynamically to workload demands rather than using static thresholds.

This responsiveness ensures:

  • Rapid upscaling to acquire more compute when needed for maintaining low latency.

  • Query admittance closer to the hardware’s limitation.

  • Quick downscaling to minimize costs when demand is low, providing consistent performance with optimized costs and resources.

When a query arrives to the warehouse, IWM predicts the cost of the query. At the same time, IWM is real-time monitoring the available compute capacity of the warehouse. Next, using machine learning models, IWM predicts if the incoming query has the necessary compute available on the existing compute. If it doesn’t have the compute needed, then the query is added to the queue. If it does have the compute needed, the query begins executing immediately.

IWM monitors the queue is monitored approximately every 10 seconds. If the queue is not decreasing quickly enough, autoscaling kicks in to rapidly procure more compute. Once new capacity is added, queued queries are admitted to the new clusters. With serverless SQL warehouses, new clusters can be added rapidly, and more than one cluster at a time can be created. The maximum number of queries in a queue for all SQL warehouse types is 1000.

Cluster sizes for pro and classic SQL warehouses

The table in this section maps SQL warehouse cluster sizes to Databricks cluster driver size and worker counts. The driver size only applies to pro and classic SQL warehouses.

Cluster size

Instance type for driver

Worker count

Total vCPU

Total Persistent Disk SSD (TB)

Total Local SSD (TB)

2X-Small

n2-highmem-8

1 x n2-highmem-8

16

1

3

X-Small

n2-highmem-8

2 x n2-highmem-8

24

1.5

4.5

Small

n2-highmem-16

4 x n2-highmem-8

48

2.5

7.5

Medium

n2-highmem-32

8 x n2-highmem-8

96

4.5

15

Large

n2-highmem-32

16 x n2-highmem-8

160

8.5

27

X-Large

n2-highmem-64

32 x n2-highmem-8

320

16.5

54

2X-Large

n2-highmem-64

64 x n2-highmem-8

576

32.5

102

3X-Large

n2-highmem-64

128 x n2-highmem-8

1088

64.5

198

4X-Large

n2-highmem-64

256 x n2-highmem-8

2112

128.5

390

The instance size of all workers is n2-highmem-8.

Note

The information in this table can vary based on product or region availability and workspace type.

Compute Engine API quota requirements

The relevant Compute Engine API relevant quota fields are:

  • N2 CPUs

  • Persistent Disk SSD (GB)

  • Local SSD (GB)

For more information about quota requirements, see Compute Engine API .

Warning

SQL warehouses won’t start if you do not provision the required amount of CPU and storage resources. See Compute Engine API. If needed, you can increase the resource quotas to support your use of SQL warehouses. See Review and increase quotas. For information about workspace cost, see cost per workspace.

Queueing and autoscaling for pro and classic SQL warehouses

Databricks limits the number of queries on a cluster assigned to a SQL warehouse based on the cost to compute their results. Upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size. Databricks recommends a cluster for every 10 concurrent queries. The maximum number of queries in a queue for all SQL warehouse types is 1000.

Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes.

  • If less than 2 minutes, don’t upscale.

  • If 2 to 6 minutes, add 1 cluster.

  • If 6 to 12 minutes, add 2 clusters.

  • If 12 to 22 minutes, add 3 clusters.

Otherwise, Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.

In addition, a warehouse is always upscaled if a query waits for 5 minutes in the queue.

If the load is low for 15 minutes, Databricks downscales the SQL warehouse. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Databricks keeps 3 clusters.

Query queuing for pro and classic SQL warehouses

Databricks queues queries when all clusters assigned to the warehouse are executing queries at full capacity or when the warehouse is in the STARTING state. The maximum number of queries in a queue for all SQL warehouse types is 1000.

Metadata queries (for example, DESCRIBE <table>) and state modifying queries (for example SET) are never queued, unless the warehouse is in the STARTING state.