This section describes the Apache Spark data sources you can use in Databricks. Many include a notebook that demonstrates how to use the data source to read and write data.
The following data sources are either directly supported in Databricks Runtime or require simple shell commands to enable access:
In addition, Databricks supports Delta Lake and makes it easy to create Delta tables from multiple data formats.
To learn how to access metadata for file-based data sources, see File metadata column.
The following storage data sources require you to configure the connection to storage. Some also require that you create a Databricks library and install it in a cluster:
- Amazon Redshift
- Working with data in Amazon S3
- Amazon S3 Select
- Accessing Azure Data Lake Storage Gen2 and Blob Storage with Databricks
- Accessing Azure Data Lake Storage Gen1 from Databricks
- Azure Cosmos DB
- Azure Synapse Analytics
- Google BigQuery
- Google Cloud Storage
- SQL databases using JDBC