What is Photon?

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 9.1 and above check marked yes Databricks Runtime 15.2 ML and above

Learn about the advantages of running your workloads on Photon, the features it supports, and how to enable or disable Photon. Photon is turned on by default in Databricks SQL warehouses and is compatible with Apache Spark APIs, so it works with your existing code.

What is Photon used for?

Photon is a high-performance Databricks-native vectorized query engine that runs your SQL workloads and DataFrame API calls faster to reduce your total cost per workload.

The following are key features and advantages of using Photon.

  • Support for SQL and equivalent DataFrame operations with Delta and Parquet tables.

  • Accelerated queries that process data faster and include aggregations and joins.

  • Faster performance when data is accessed repeatedly from the disk cache.

  • Robust scan performance on tables with many columns and many small files.

  • Faster Delta and Parquet writing using UPDATE, DELETE, MERGE INTO, INSERT, and CREATE TABLE AS SELECT, including wide tables that contain thousands of columns.

  • Replaces sort-merge joins with hash-joins.

  • For AI and ML workloads, Photon improves performance for applications using Spark SQL, Spark DataFrames, feature engineering, GraphFrames, and xgboost4j.

Get started with Photon

Photon is available on clusters running Databricks Runtime 9.1 LTS and above, and on clusters running Databricks Runtime 15.2 for Machine Learning and above.

To enable Photon on your cluster, select the Use Photon Acceleration checkbox when you create or edit the cluster.

If you create a cluster using the Clusters API, set runtime_engine to PHOTON.

Instance types

Photon supports a number of instance types on the driver and worker nodes. Photon instance types consume DBUs at a different rate than the same instance type running the non-Photon runtime.

For more information about Photon instances and DBU consumption, see the Databricks pricing page.

Operators, expressions, and data types

The following are the operators, expressions, and data types that Photon covers.

Operators

  • Scan, Filter, Project

  • Hash Aggregate/Join/Shuffle

  • Nested-Loop Join

  • Null-Aware Anti Join

  • Union, Expand, ScalarSubquery

  • Delta/Parquet Write Sink

  • Sort

  • Window Function

Expressions

  • Comparison / Logic

  • Arithmetic / Math (most)

  • Conditional (IF, CASE, etc.)

  • String (common ones)

  • Casts

  • Aggregates(most common ones)

  • Date/Timestamp

Data types

  • Byte/Short/Int/Long

  • Boolean

  • String/Binary

  • Decimal

  • Float/Double

  • Date/Timestamp

  • Struct

  • Array

  • Map

Features that require Photon

The following are features that require Photon.

Limitations

  • Structured Streaming: Photon currently supports stateless streaming with Delta, Parquet, CSV, and JSON. Stateless Kafka and Kinesis streaming is supported when writing to a Delta or Parquet sink.

  • Photon does not support UDFs or RDD APIs.

  • Photon doesn’t impact queries that normally run in under two seconds.

Features not supported by Photon run the same way they would with Databricks Runtime.