What are init scripts?

An init script (initialization script) is a shell script that runs during startup of each cluster node before the Apache Spark driver or executor JVM starts. This article provides recommendations for init scripts.

Recommendations for init scripts

Databricks recommends using built-in platform features whenever possible. Widespread use of init scripts can slow migration to new Databricks Runtime versions and prevent adoption of some Databricks optimizations.

Important

If you need to migrate from init scripts on DBFS, see Migrate init scripts from DBFS.

For example, Databricks recommends using compute policies to set system properties, environmental variables, and Spark configuration parameters. See Compute policy reference.

Databricks recommends managing all init scripts as cluster-scoped init scripts.

Databricks recommends using new Databricks Runtime versions and Unity Catalog. The following table provides recommendations for init script use organized by Databricks Runtime version and Unity Catalog enablement.

Environment

Recommendation

Databricks Runtime 13.3 LTS and above with Unity Catalog.

Store init scripts in Unity Catalog volumes.

Workloads without Unity Catalog where init scripts don’t reference other files.

Store init scripts as workspace files. (File size limit is 500 MB).

Workloads without Unity Catalog where init scripts reference other files such as libraries, configuration files, or shell scripts.

Store init scripts using cloud object storage.

Note

No isolation shared access mode does not support Unity Catalog volumes.

Init scripts are not supported on all cluster configurations and not all files can be referenced from init scripts. See Compute compatibility with libraries and init scripts and What files can I reference in an init script?.

What types of init scripts does Databricks support?

Databricks supports two kinds of init scripts: cluster-scoped and global. Databricks only recommends using cluster-scoped init scripts. You can get behavior similar to global init script by using cluster policies or cluster-scoped init scripts.

  • Cluster-scoped: run on every cluster configured with the script. This is the recommended way to run an init script. See Use cluster-scoped init scripts.

  • Global: run on all clusters in the workspace configured with Single User access mode or no-isolation shared access mode. Not run on clusters with shared access mode. They can help you to enforce consistent cluster configurations across your workspace. Use them carefully because they can cause unanticipated impacts, like library conflicts. Only workspace admin users can create global init scripts. See Use global init scripts.

Whenever you change any type of init script, you must restart all clusters affected by the script.

Init script execution order

The order of execution of init scripts is:

  1. Global

  2. Cluster-scoped

Migrate init scripts from DBFS

Users that need to migrate init scripts from DBFS can use the following guides. Make sure you’ve identified the correct target for your configuration. See Recommendations for init scripts.