Configure streaming data sources

Databricks can integrate with stream messaging services for near-real time data ingestion into the Databricks lakehouse. Databricks can also sync enriched and transformed data in the lakehouse with other streaming systems.

Structured Streaming provides native streaming access to file formats supported by Apache Spark, but Databricks recommends Auto Loader for most Structured Streaming operations that read data from cloud object storage. See What is Auto Loader?.

Ingesting streaming messages to Delta Lake allows you to retain messages indefinitely, allowing you to replay data streams without fear of losing data due to retention thresholds.

To learn more about specific configurations for streaming from message queues, see: