Configure pools

This article explains the configuration options available when you create and edit a pool.

Configure pool

Pool size and auto termination

When you create a pool, in order to control its size, you can set three parameters: minimum idle instances, maximum capacity, and idle instance auto termination.

Minimum Idle Instances

The minimum number of instances the pool keeps idle. These instances do not terminate, regardless of the setting specified in Idle Instance Auto Termination. If a cluster consumes idle instances from the pool, Databricks provisions additional instances to maintain the minimum.

Minimum Idle Instances configuration

Maximum Capacity

The maximum number of instances that the pool will provision. If set, this value constrains all instances (idle + used). If a cluster using the pool requests more instances than this number during autoscaling, the request will fail with an INSTANCE_POOL_MAX_CAPACITY_FAILURE error.

Maximum Capacity configuration

This configuration is optional. Databricks recommend setting a value only in the following circumstances:

  • You have an instance quota you must stay under.

  • You want to protect one set of work from impacting another set of work. For example, suppose your instance quota is 100 and you have teams A and B that need to run jobs. You can create pool A with a max 50 and pool B with max 50 so that the two teams share the 100 quota fairly.

  • You need to cap cost.

Idle Instance Auto Termination

Pools have a fixed auto termination time of two minutes. Auto termination for pools is not configurable.

Instance types

A pool consists of both idle instances kept ready for new clusters and instances in use by running clusters. All of these instances are of the same instance provider type, selected when creating a pool.

A pool’s instance type cannot be edited. Clusters attached to a pool use the same instance type for the driver and worker nodes. Different families of instance types fit different use cases, such as memory-intensive or compute-intensive workloads.

Databricks always provides one year’s deprecation notice before ceasing support for an instance type.

Preload Databricks Runtime version

You can speed up cluster launches by selecting a Databricks Runtime version to be loaded on idle instances in the pool. If a user selects that runtime when they create a cluster backed by the pool, that cluster will launch even more quickly than a pool-backed cluster that doesn’t use a preloaded Databricks Runtime version.

Setting this option to None slows down cluster launches, as it causes the Databricks Runtime version to download on demand to idle instances in the pool. When the cluster releases the instances in the pool, the Databricks Runtime version remains cached on those instances. The next cluster creation operation that uses the same Databricks Runtime version might benefit from this caching behavior, but it is not guaranteed.

Preloaded runtime version

Pool tags

Pool tags allow you to monitor the cost of cloud resources used by various groups in your organization.

The Databricks billable usage graphs in the account console can aggregate usage by individual tags. The billable usage CSV reports downloaded from the same page also include default and custom tags. Tags also propagate to GKE and GCE labels.

For convenience, Databricks applies three default tags to each pool: Vendor, DatabricksInstancePoolId, and DatabricksInstancePoolCreatorId. You can also add custom tags when you create a pool. You can add up to 43 custom tags.

Custom tag inheritance

Pool-backed clusters inherit default and custom tags from the pool configuration. For detailed information about how pool tags and cluster tags work together, see Monitor usage using cluster and pool tags.

Configure custom pool tags

  1. At the bottom of the pool configuration page, select the Tags tab.

  2. Specify a key-value pair for the custom tag.

    Tag key-value pair
  3. Click Add.