Apache Spark Structured Streaming is a near-real time processing engine that offers end-to-end fault tolerance with exactly-once processing guarantees using familiar Spark APIs. Structured Streaming lets you express computation on streaming data in the same way you express a batch computation on static data. The Structured Streaming engine performs the computation incrementally and continuously updates the result as streaming data arrives. For an overview of Structured Streaming, see the Apache Spark Structured Streaming Programming Guide.
Structured Streaming pairs tightly with Delta Lake to offer enhanced functionality for incremental data processing at scale in the Databricks Lakehouse. Structured Streaming is the core technology at the heart of Databricks Auto Loader, as well as Delta Live Tables.
Databricks recommends using Auto Loader to ingest supported file types from cloud object storage into Delta Lake. For ETL pipelines, Databricks recommends using Delta Live Tables (which uses Delta tables and Structured Streaming). You can also configure incremental ETL workloads by streaming to and from Delta Lake tables.
In addition to Delta Lake and Auto Loader, Structured Streaming can connect to messaging services such as Apache Kafka.
Databricks supports a number of edge features not found in Apache Spark to help customers get the best performance out of Structured Streaming. Learn more about these features and other recommendations for Production considerations for Structured Streaming.
For introductory notebooks and notebooks demonstrating example use cases, see Structured Streaming patterns on Databricks.