What is a Databricks SQL warehouse?

This article introduces SQL warehouses (formerly SQL endpoints) and describes how to work with them using the Databricks SQL UI. A SQL warehouse is a compute resource that lets you run SQL commands on data objects within Databricks SQL. Compute resources are infrastructure resources that provide processing capabilities in the cloud.

SQL endpoints name changed to SQL warehouses

Databricks changed the name from SQL endpoint to SQL warehouse because it is more than just an API entry point for running SQL commands. A SQL warehouse is a compute resource for all your data warehousing needs, an integral part of the Lakehouse Platform.

Other compute resource types include Databricks clusters. To work with SQL warehouses using the API, see SQL Warehouses APIs 2.0.

Before you begin

When you create your first SQL warehouses, Databricks recommends that you accept the defaults as they appear on the New SQL warehouse page. But you have many options that you can configure to meet your specific needs. Among those options, you should be aware of Channels, which let you choose whether to use the current SQL warehouse compute version or the preview version. Preview versions let you try out functionality before it becomes the Databricks SQL standard. Take advantage of preview versions to test your queries and dashboards against upcoming changes. Typically, preview versions are promoted to the current version two weeks after initial preview release, but some previews may last longer. You can learn about the features in the latest preview version by reviewing the release notes. Databricks does not recommend using preview versions for production workloads.

Requirements

  • To create a SQL warehouse you must have cluster creation permission in Databricks Data Science & Engineering.

  • To manage a SQL warehouse you must have Can Manage permission in Databricks SQL.

View SQL warehouses

Click Endpoints Icon SQL Warehouses in the sidebar.

By default, warehouses are sorted by state (running warehouses first), then in alphabetical order. You can reorder the list by clicking the column headings.

To help you get started quickly, such as when you follow the Databricks SQL quickstart, Databricks creates a SQL warehouse called Starter Warehouse automatically. This SQL warehouse is sized Small. You can edit or delete this SQL warehouse.

To filter the list of warehouses, enter text in the search box:

To see more about a warehouse’s configuration, including connection details, monitoring, and advanced settings, click the name of the warehouse.

Create a SQL warehouse

You can create a SQL warehouse using the New SQL Warehouse page in the web UI or using the SQL Warehouse API.

You can create a SQL warehouse using the New SQL Warehouse page in the web UI or using the SQL Warehouse API.

By accepting all the defaults on the New SQL Warehouse page, you can create an efficient and high-performing SQL warehouse quickly and easily. You can override those defaults if your workload or environment requires it.

To create a SQL warehouse using the web UI:

  1. Click Endpoints Icon-1 SQL Warehouses in the sidebar.

  2. Click Create SQL Warehouse to open the New SQL Warehouse dialog.

  3. Enter a name for the warehouse.

  4. Accept the default warehouse properties or edit them.

    • Cluster Size represents the number of cluster workers and size of compute resources available to run your queries and dashboards. The default is X-Large. To reduce query latency, increase the size. For details, see Cluster size.

    • Auto Stop determines whether the warehouse stops if it’s idle for the specified number of minutes. The default is 10 minutes. 120 minutes is recommended.

      Idle SQL warehouses continue to accumulate DBU and cloud instance charges until they are stopped.

    • Scaling sets the minimum and maximum number of clusters over which queries sent to the warehouse are distributed.

      The default is a minimum of one and maximum of one cluster.

      To handle more concurrent users for a given query, increase the cluster count. Databricks recommends a cluster for every ten concurrent queries. To learn how Databricks adds clusters to and removes clusters from a warehouse, see Queueing and autoscaling.

  5. (Optional) Configure advanced options.

    If you want to do any of the following, expand Advanced options:

    • Add tags to help monitor SQL warehouse usage

    • Configure the spot instance policy for the SQL warehouse

    • Use the Preview channel to test upcoming features

    For details, see Advanced options.

  6. Click Create.

  7. Do one of the following:

    The warehouse is created and started.

You can also create a SQL warehouse with the Databricks Terraform provider and databricks_sql_endpoint.

Advanced options

You can configure the following advanced options by expanding the Advanced options area when you create a SQL warehouse using the New SQL warehouse dialog or edit an existing SQL warehouse using the edit dialog. You can also configure these options using the SQL Warehouses APIs 2.0.

Configure advanced options

Add tags for usage monitoring

Tags allow you to easily monitor the cost of cloud resources used by users and groups in your organization. When you create or edit a SQL warehouse, expand the Advanced options area to specify tags as key-value pairs. Databricks applies these tags to cloud resources.

Configure spot instance policy

The spot instance policy determines whether workers use only on-demand instances or a combination of on-demand and spot instances. Cost Optimized (the default) uses mostly spot instances and one on-demand instance. Reliability Optimized uses only on-demand instances.

Use the preview channel

Channels let you choose whether to use the Current SQL warehouse compute version or the Preview version. A preview version let you try out functionality before it becomes the Databricks SQL standard. Take advantage of the preview channel to test your queries and dashboards against upcoming changes. You can learn about what’s in the latest preview version in the release notes.

Current is the default. To switch a SQL warehouse to the preview channel, expand Advanced options and select Preview.

Important

Databricks recommends against using a preview version for production workloads. Because only administrators can view an warehouse’s properties, including its channel, consider indicating that an SQL warehouse uses a preview version in the warehouse’s name so that users do not inadvertently use it for production workloads.

Start, stop, or delete a SQL warehouse

  1. Click Warehouse Icon-2 SQL Warehouses in the sidebar.

  2. To stop a running warehouse, click Stop.

  3. To start a stopped warehouse, click Start.

  4. To delete an warehouse, click the vertical ellipsis Vertical Ellipsis in the Actions column, then click Delete.

Edit a SQL warehouse

You can choose to edit a SQL warehouse using the web UI or the SQL Warehouse API.

To edit a SQL warehouse using the web UI:

  1. Click Endpoints Icon-3 SQL Warehouses in the sidebar.

  2. In the Actions column, click the vertical ellipsis Vertical Ellipsis and click Edit.

  3. Edit the warehouse properties.

    For information about each editable property, see Create a SQL warehouse and Advanced options.

  4. Click Save or Save and restart.

Configure SQL warehouse permissions

To configure permissions for a SQL warehouse:

  1. Click Endpoints Icon-4 SQL Warehouses in the sidebar.

  2. Click a warehouse.

  3. Click the Permissions Button button.

    The SQL Warehouse Permissions dialog appears. The warehouse creator and Databricks admins have Can Manage permission by default.

    Add permission
  4. Select a user or group and a permission.

  5. Click Add.

  6. Click Save.

To learn about permission levels, see SQL warehouse access control.

Monitor a SQL warehouse

You can view the number of queries handled by the warehouse and the number of clusters allocated to the warehouse.

  1. Click Endpoints Icon-5 SQL Warehouses in the sidebar.

  2. Click a warehouse.

  3. Click Monitoring.

    A chart showing the number of queries handled by the warehouse and the number of clusters allocated to the warehouse over the last 6 hours displays.

    Click a timescale button at the top right of the chart to change the displayed period. For example, the following screenshot shows these statistics over 7 days:

    Monitor warehouse

    Note

    The Cluster Count can be greater than one only if scaling is enabled and configured.

Cluster size

The table in this section maps SQL warehouse cluster sizes to Databricks cluster driver size and worker counts. The driver size only applies to Classic SQL warehouses.

Cluster size

Instance type for driver

Worker count

Total vCPU

Total Persistent Disk SSD (TB)

Total Local SSD (TB)

2X-Small

n2-highmem-8

1

16

1

3

X-Small

n2-highmem-8

2

24

1.5

4.5

Small

n2-highmem-16

4

48

2.5

7.5

Medium

n2-highmem-32

8

96

4.5

15

Large

n2-highmem-32

16

160

8.5

27

X-Large

n2-highmem-64

32

320

16.5

54

2X-Large

n2-highmem-64

64

576

32.5

102

3X-Large

n2-highmem-64

128

1088

64.5

198

4X-Large

n2-highmem-64

256

2112

128.5

390

The instance size of all workers is n2-highmem-8.

Compute Engine API quota requirements

The relevant Compute Engine API relevant quota fields are:

  • N2 CPUs

  • Persistent Disk SSD (GB)

  • Local SSD (GB)

For more information about quota requirements, see Compute Engine API .

WARNING: SQL warehouses won’t start if you do not provision the required amount of CPU and storage resources. See Compute Engine API. If needed, you can increase the resource quotas to support your use of SQL warehouses. See Review and increase quotas. For information about workspace cost, see cost per workspace.

Queueing and autoscaling

Databricks limits the number of queries on a cluster assigned to a SQL warehouse based on the cost to compute their results. Upscaling of clusters per warehouse is based on query throughput, the rate of incoming queries, and the queue size. Databricks adds clusters based on the time it would take to process all currently running queries, all queued queries, and the incoming queries expected in the next two minutes as follows:

  • Less than 2 minutes, don’t upscale.

  • 2 to 6 minutes, add 1 cluster.

  • 6 to 12 minutes, add 2 clusters.

  • 12 to 22 minutes, add 3 clusters.

Otherwise, Databricks adds 3 clusters plus 1 cluster for every additional 15 minutes of expected query load.

In addition, an warehouse is always upscaled if a query waits for 5 minutes in the queue.

If the load is low for 15 minutes, Databricks downscales the SQL warehouse. It keeps enough clusters to handle the peak load over the last 15 minutes. For example, if the peak load was 25 concurrent queries, Databricks keeps 3 clusters.

Query queuing

Databricks queues queries when all clusters assigned to the warehouse are executing queries at full capacity or when the warehouse is in the STARTING state.

Metadata queries (for example, DESCRIBE <table>) and state modifying queries (for example SET) are never queued, unless the warehouse is in the STARTING state.

Query routing

Databricks routes queries as follows:

  • New session: to the cluster with the least load.

  • Existing session: to the cluster that ran the previous query for that session. If that cluster does not have available capacity, the query is routed to the cluster with the least load.