When you run a Databricks job, the tasks configured as part of the job run on Databricks compute, either a cluster or a SQL warehouse depending on the task type. Selecting the compute type and configuration options is important when you operationalize a job. This article provides a guide to using Databricks compute resources to run your jobs.
Secrets are not redacted from a cluster’s Spark driver log
stderr streams. To protect sensitive data, by default, Spark driver logs are viewable only by users with CAN MANAGE permission on job, single user access mode, and shared access mode clusters. To allow users with CAN ATTACH TO or CAN RESTART permission to view the logs on these clusters, set the following Spark configuration property in the cluster configuration:
On No Isolation Shared access mode clusters, the Spark driver logs can be viewed by users with CAN ATTACH TO or CAN MANAGE permission. To limit who can read the logs to only users with the CAN MANAGE permission, set
See Spark configuration to learn how to add Spark properties to a cluster configuration.
New Job Clusters are dedicated clusters for a job or task run. A shared job cluster is created and started when the first task using the cluster starts and terminates after the last task using the cluster completes. The cluster is not terminated when idle but terminates only after all tasks using it have completed. If a shared job cluster fails or is terminated before all tasks have finished, a new cluster is created. A cluster scoped to a single task is created and started when the task starts and terminates when the task completes. In production, Databricks recommends using new shared or task scoped clusters so that each job or task runs in a fully isolated environment.
When you run a task on a new cluster, the task is treated as a data engineering (task) workload, subject to the task workload pricing. When you run a task on an existing all-purpose cluster, the task is treated as a data analytics (all-purpose) workload, subject to all-purpose workload pricing.
If you select a terminated existing cluster and the job owner has CAN RESTART permission, Databricks starts the cluster when the job is scheduled to run.
Existing all-purpose clusters work best for tasks such as updating dashboards at regular intervals.
To decrease new job cluster start time, create a pool and configure the job’s cluster to use the pool.