Single Node clusters
A Single Node cluster is a cluster consisting of an Apache Spark driver and no Spark workers. A Single Node cluster supports Spark jobs and all Spark data sources, including Delta Lake. A Standard cluster requires a minimum of one Spark worker to run Spark jobs.
Single Node clusters are helpful for:
Single-node machine learning workloads that use Spark to load and save data
Lightweight exploratory data analysis
Create a Single Node cluster
To create a Single Node cluster, select the Single Node button when you configure a cluster.
Single Node cluster properties
A Single Node cluster has the following properties:
Runs Spark locally.
The driver acts as both master and worker, with no worker nodes.
Spawns one executor thread per logical core in the cluster, minus 1 core for the driver.
All
stderr
,stdout
, andlog4j
log output is saved in the driver log.A Single Node cluster can’t be converted to a Multi Node cluster.
Limitations
Large-scale data processing will exhaust the resources on a Single Node cluster. For these workloads, Databricks recommends using a Multi Node cluster.
Single Node clusters are not designed to be shared. To avoid resource conflicts, Databricks recommends using a Multi Node cluster when the cluster must be shared.
A Multi Node cluster can’t be scaled to 0 workers. Use a Single Node cluster instead.
Single Node clusters are not compatible with process isolation.
GPU scheduling is not enabled on Single Node clusters.
On Single Node clusters, Spark cannot read Parquet files with a UDT column. The following error message results:
The Spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
To work around this problem, disable the native Parquet reader:
spark.conf.set("spark.databricks.io.parquet.nativeReader.enabled", False)
REST API
You can use the Clusters API to create a Single Node cluster.
Single Node cluster policy
Note
Cluster policies are unavailable on Databricks on Google Cloud.
Cluster policies simplify cluster configuration for Single Node clusters.
Consider the example of a data science team whose members do not have permission to create clusters. An admin can create a cluster policy that authorizes team members to create a maximum number of Single Node clusters, using pools and cluster policies:
Create a pool:
Set Max capacity to
10
.In Autopilot options, enable autoscaling enabled for local storage.
Set Instance type to Single Node cluster.
Select a Databricks version. Databricks recommends using the latest version if possible.
Click Create.
The pool’s properties page appears. Make a note of the pool ID and instance type ID page for the newly-created pool.
Create a cluster policy:
Set the pool ID and instance type ID from the pool properties from the pool.
Specify constraints as needed.
Grant the cluster policy to the team members. You can use Manage users, service principals, and groups to simplify user management.