Single Node clusters

A Single Node cluster is a cluster consisting of an Apache Spark driver and no Spark workers. A Single Node cluster supports Spark jobs and all Spark data sources, including Delta Lake. A Standard cluster requires a minimum of one Spark worker to run Spark jobs.

Single Node clusters are helpful for:

  • Single-node machine learning workloads that use Spark to load and save data
  • Lightweight exploratory data analysis

Create a Single Node cluster

To create a Single Node cluster, set Cluster Mode to Single Node when you configure a cluster.

Single Node cluster mode

Single Node cluster properties

A Single Node cluster has the following properties:

  • Runs Spark locally.
  • The driver acts as both master and worker, with no worker nodes.
  • Spawns one executor thread per logical core in the cluster, minus 1 core for the driver.
  • All stderr, stdout, and log4j log output is saved in the driver log.
  • A Single Node cluster can’t be converted to a Standard cluster. To use a Standard cluster, create the cluster and attach your notebook to it.

Limitations

  • Large-scale data processing will exhaust the resources on a Single Node cluster. For these workloads, Databricks recommends using a Standard mode cluster.

  • Single Node clusters are not designed to be shared. To avoid resource conflicts, Databricks recommends using a Standard mode cluster when the cluster must be shared.

  • A Standard mode cluster can’t be scaled to 0 workers. Use a Single Node cluster instead.

  • Single Node clusters are not compatible with process isolation.

  • GPU scheduling is not enabled on Single Node clusters.

  • On Single Node clusters, Spark cannot read Parquet files with a UDT column. The following error message results:

    The Spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
    

    To work around this problem, disable the native Parquet reader:

    spark.conf.set("spark.databricks.io.parquet.nativeReader.enabled", False)
    

REST API

You can use the Clusters API to create a Single Node cluster.

Single Node cluster policy

Note

Cluster policies are unavailable on Databricks on Google Cloud.

Cluster policies simplify cluster configuration for Single Node clusters.

Consider the example of a data science team whose members do not have permission to create clusters. An admin can create a cluster policy that authorizes team members to create a maximum number of Single Node clusters, using pools and cluster policies:

  1. Create a pool:

    1. Set Max capacity to 10.
    2. In Autopilot options, enable autoscaling enabled for local storage.
    3. Set Instance type to Single Node cluster.
    4. Select a Databricks version. Databricks recommends using the latest version if possible.
    5. Click Create.

    The pool’s properties page appears. Make a note of the pool ID and instance type ID page for the newly-created pool.

  2. Create a cluster policy:

    • Set the pool ID and instance type ID from the pool properties from the pool.
    • Specify constraints as needed.
  3. Grant the cluster policy to the team members. You can use Manage users and groups to simplify user management.

Single Node job cluster policy

Note

Cluster policies are unavailable on Databricks on Google Cloud.

To set up a cluster policy for jobs, you can define a similar cluster policy. Set the cluster_type.type to fixed and cluster_type.value to job. Remove all references to auto_termination_minutes.