Azure Cosmos DB

Important

This documentation has been retired and might not be updated. See the official Cosmos DB Spark connector Github repo.

Azure Cosmos DB is Microsoft’s globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure’s geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs). Azure Cosmos DB provides APIs for the following data models, with SDKs available in multiple languages:

  • SQL API

  • MongoDB API

  • Cassandra API

  • Graph (Gremlin) API

  • Table API

This article explains how to read data from and write data to Azure Cosmos DB using Databricks. For more the most up-to-date details about Azure Cosmos DB, see Accelerate big data analytics by using the Apache Spark to Azure Cosmos DB connector.

Resources:

Important

This connector supports the core (SQL) API of Azure Cosmos DB. For the Cosmos DB for MongoDB API, use the MongoDB Spark connector. For the Cosmos DB Cassandra API, use the Cassandra Spark connector.

Create and attach required libraries

  1. Download the latest azure-cosmosdb-spark library for the version of Apache Spark you are running.

  2. Upload the downloaded JAR files to Databricks. See Libraries.

  3. Install the uploaded libraries into your Databricks cluster.