Configure SQL warehouses

This article explains how to configure and manage SQL warehouses (formerly SQL endpoints) using the Databricks SQL UI.

What is a SQL Warehouse?

A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL. Compute resources are infrastructure resources that provide processing capabilities in the cloud.

SQL endpoints name changed to SQL warehouses

Databricks changed the name from SQL endpoint to SQL warehouse because it is more than just an API entry point for running SQL commands. A SQL warehouse is a compute resource for all your data warehousing needs, an integral part of the Lakehouse Platform.

Other compute resource types include Databricks clusters. To work with SQL warehouses using the API, see SQL Warehouses APIs 2.0.

Requirements

  • To create and manage a SQL warehouse you must be a workspace admin and have the Databricks SQL entitlement.

  • To manage a SQL warehouse if you are not a workspace admin, you must have Can Manage permission in Databricks SQL.

View SQL warehouses

To navigate to the SQL warehouse dashboard, click Endpoints Icon SQL Warehouses in the sidebar.

By default, warehouses are sorted by state (running warehouses first), then in alphabetical order.

To help you get started quickly, Databricks creates a SQL warehouse called Starter Warehouse automatically. This SQL warehouse is sized Small. You can edit or delete this SQL warehouse.

Create a SQL warehouse

You can create a SQL warehouse using the New SQL Warehouse page in the web UI or using the SQL Warehouse API.

By accepting all the defaults on the New SQL Warehouse page, you can create an efficient and high-performing SQL warehouse quickly and easily. If your workload or environment requires different capabilities, you can override those defaults.

To create a SQL warehouse using the web UI:

  1. Click Endpoints Icon-1 SQL Warehouses in the sidebar.

  2. Click Create SQL Warehouse.

  3. Enter a name for the warehouse.

  4. Accept the default warehouse settings or edit them.

    • Cluster Size represents the number of cluster workers and size of compute resources available to run your queries and dashboards. The default is X-Large. To reduce query latency, increase the size. For details, see Cluster size.

    • Auto Stop determines whether the warehouse stops if it’s idle for the specified number of minutes. The default is 45 minutes, which is recommended for typical use. The minimum is 10 minutes. Idle SQL warehouses continue to accumulate DBU and cloud instance charges until they are stopped.

    • Scaling sets the minimum and maximum number of clusters over which queries sent to the warehouse are distributed.

      The default is a minimum of one and maximum of one cluster.

      To handle more concurrent users for a given query, increase the cluster count. Databricks recommends a cluster for every ten concurrent queries. To learn how Databricks adds clusters to and removes clusters from a warehouse, see Queueing and autoscaling.

  5. (Optional) Configure advanced options. For details, see Advanced options.

  6. Click Create.

  7. You can then configure warehouse permissions if you’d like.

Your SQL warehouse is now created and started. You can also create a SQL warehouse with the Databricks Terraform provider and databricks_sql_endpoint.

Advanced options

You can configure the following advanced options by expanding the Advanced options area when you create a new SQL warehouse or edit an existing SQL warehouse. You can also configure these options using the SQL Warehouses APIs 2.0.

  • Tags: Tags allow you to easily monitor the cost of cloud resources used by users and groups in your organization. You specify tags as key-value pairs.

  • Spot instance policy: The spot instance policy determines whether workers use only on-demand instances or a combination of on-demand and spot instances. Cost Optimized (the default) uses mostly spot instances and one on-demand instance. Reliability Optimized uses only on-demand instances.

  • Channel: Use the Preview channel to test upcoming features. The preview version lets you try out functionality before it becomes the Databricks SQL standard. You can use it to test your queries and dashboards against upcoming changes.

    Use the release notes to learn what’s in the latest preview version.

Important

Databricks recommends against using a preview version for production workloads. Because only admins can view a warehouse’s properties, including its channel, consider indicating that an SQL warehouse uses a preview version in the warehouse’s name so that users do not inadvertently use it for production workloads.

Start, stop, or delete a SQL warehouse

  1. Click Warehouse Icon-2 SQL Warehouses in the sidebar.

  2. To stop a running warehouse, click Stop.

  3. To start a stopped warehouse, click Start.

  4. To delete an warehouse, click the vertical ellipsis Vertical Ellipsis in the Actions column, then click Delete.

Edit a SQL warehouse

You can choose to edit a SQL warehouse using the web UI or the SQL Warehouse API.

To edit a SQL warehouse using the web UI:

  1. Click Endpoints Icon-3 SQL Warehouses in the sidebar.

  2. In the Actions column, click the vertical ellipsis Vertical Ellipsis and click Edit.

  3. Edit the warehouse properties.

  4. Click Save.

Configure SQL warehouse permissions

To configure permissions for a SQL warehouse:

  1. Click Endpoints Icon-4 SQL Warehouses in the sidebar.

  2. Click a warehouse.

  3. In the Actions column, click the vertical ellipsis Vertical Ellipsis then click Permissions.

  4. Select a user or group and a permission.

  5. Click Add.

  6. Click Save.

To learn about permission levels, see SQL warehouse access control.

Monitor a SQL warehouse

You can view the number of queries handled by the warehouse and the number of clusters allocated to the warehouse.

  1. Click Endpoints Icon-5 SQL Warehouses in the sidebar.

  2. Click a SQL warehouse.

  3. Click Monitoring.

The chart shows the number of queries handled by the warehouse and the number of clusters allocated to the warehouse. Click the timescale buttons above the chart to change the displayed period. For example, the following screenshot shows these statistics over 7 days:

Monitor warehouse

Note

The Cluster Count can be greater than one only if scaling is enabled and configured.

Cluster size

The table in this section maps SQL warehouse cluster sizes to Databricks cluster driver size and worker counts. The driver size only applies to pro and classic SQL warehouses.

Cluster size

Instance type for driver

Worker count

Total vCPU

Total Persistent Disk SSD (TB)

Total Local SSD (TB)

2X-Small

n2-highmem-8

1 x n2-highmem-8

16

1

3

X-Small

n2-highmem-8

2 x n2-highmem-8

24

1.5

4.5

Small

n2-highmem-16

4 x n2-highmem-8

48

2.5

7.5

Medium

n2-highmem-32

8 x n2-highmem-8

96

4.5

15

Large

n2-highmem-32

16 x n2-highmem-8

160

8.5

27

X-Large

n2-highmem-64

32 x n2-highmem-8

320

16.5

54

2X-Large

n2-highmem-64

64 x n2-highmem-8

576

32.5

102

3X-Large

n2-highmem-64

128 x n2-highmem-8

1088

64.5

198

4X-Large

n2-highmem-64

256 x n2-highmem-8

2112

128.5

390

The instance size of all workers is n2-highmem-8.

Compute Engine API quota requirements

The relevant Compute Engine API relevant quota fields are:

  • N2 CPUs

  • Persistent Disk SSD (GB)

  • Local SSD (GB)

For more information about quota requirements, see Compute Engine API .

WARNING: SQL warehouses won’t start if you do not provision the required amount of CPU and storage resources. See Compute Engine API. If needed, you can increase the resource quotas to support your use of SQL warehouses. See Review and increase quotas. For information about workspace cost, see cost per workspace.

Queueing and autoscaling

Databricks limits the number of queries on a cluster assigned to a SQL warehouse based on the cost to compute their results. Upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size.

Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes.

  • If less than 2 minutes, don’t upscale.

  • If 2 to 6 minutes, add 1 cluster.

  • If 6 to 12 minutes, add 2 clusters.

  • If 12 to 22 minutes, add 3 clusters.

Otherwise, Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.

In addition, an warehouse is always upscaled if a query waits for 5 minutes in the queue.

If the load is low for 15 minutes, Databricks downscales the SQL warehouse. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Databricks keeps 3 clusters.

Query queuing

Databricks queues queries when all clusters assigned to the warehouse are executing queries at full capacity or when the warehouse is in the STARTING state.

Metadata queries (for example, DESCRIBE <table>) and state modifying queries (for example SET) are never queued, unless the warehouse is in the STARTING state.

Query routing

Databricks routes queries as follows:

  • New session: to the cluster with the least load.

  • Existing session: to the cluster that ran the previous query for that session. If that cluster does not have available capacity, the query is routed to the cluster with the least load.