Pool configuration reference

This article describes the available settings when creating a pool using the UI. To learn how to use the Databricks CLI to create a pool, see Databricks CLI commands. To learn how to use the REST API to create a pool, see the Instance Pools API.

Pool size

When you create a pool, in order to control its size, you can set the minimum idle instances and the maximum capacity. Auto termination for idle instances in pools is not supported.

Minimum Idle Instances

The minimum number of instances the pool keeps idle. These instances do not terminate, regardless of the auto termination settings. If a cluster consumes idle instances from the pool, Databricks provisions additional instances to maintain the minimum.

Instance types

A pool consists of both idle instances kept ready for new clusters and instances in use by running clusters. All of these instances are of the same instance provider type, selected when creating a pool.

A pool’s instance type cannot be edited. Clusters attached to a pool use the same instance type for the driver and worker nodes. Different families of instance types fit different use cases, such as memory-intensive or compute-intensive workloads.

Databricks always provides one year’s deprecation notice before ceasing support for an instance type.

Preloaded Databricks Runtime version

You can speed up cluster launches by selecting a Databricks Runtime version to be loaded on idle instances in the pool. If a user selects that runtime when they create a cluster backed by the pool, that cluster will launch even more quickly than a pool-backed cluster that doesn’t use a preloaded Databricks Runtime version.

Setting this option to None slows down cluster launches, as it causes the Databricks Runtime version to download on demand to idle instances in the pool. When the cluster releases the instances in the pool, the Databricks Runtime version remains cached on those instances. The next cluster creation operation that uses the same Databricks Runtime version might benefit from this caching behavior, but it is not guaranteed.

Pool tags

Pool tags allow you to monitor the cost of cloud resources used by various groups in your organization.

The Databricks billable usage graphs in the account console can aggregate usage by individual tags. The billable usage CSV reports downloaded from the same page also include default and custom tags. Tags also propagate to GKE and GCE labels.

For convenience, Databricks applies three default tags to each pool: Vendor, DatabricksInstancePoolId, and DatabricksInstancePoolCreatorId. You can also add custom tags when you create a pool. You can add up to 43 custom tags.

Custom tags

To add additional tags to the pool, navigate to the Tabs tab at the bottom of the Create Pool page. Click the + Add button, then enter the key-value pair.

Pool-backed clusters inherit default and custom tags from the pool configuration. For detailed information about how pool tags and cluster tags work together, see Monitor usage using tags.

Configure the availability zone

You can configure the pool’s availability zone when you create the pool using the Instance Pools API. This is an optional field. If not specified, the pool uses a default zone.

You cannot update a pool’s availability zone after the pool is launched. If you would like your pool to use a different availability zone, you must create a new pool.

To set the availability zone, add a zone_id attribute to the gcp_attributes object. For example:

"gcp_attributes": {
    "zone_id": "us-central1-a"
}

Note

The provided availability zone must be in the same region as your Databricks workspace.

Conflicting zonal configurations

If the compute resource you attach to an instance pool is configured to use a different availability zone than the instance pool, the compute resource’s configurations are ignored and it inherits the instance pool’s zonal configuration.

The compute resource’s driver inherits the zonal or multi-zonal preference from the driver instance pool and any executors inherit the zonal or multi-zonal preference from the executor instance pool.