Connect to Azure Blob Storage with WASB (legacy)

Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) for Azure Blob Storage in favor of the Azure Blob Filesystem driver (ABFS); see Connect to Azure Data Lake Storage Gen2 and Blob Storage. ABFS has numerous benefits over WASB; see Azure documentation on ABFS.

This article provides documentation for maintaining code that uses the WASB driver. Databricks recommends using ABFS for all connections to Azure Blob Storage.

Configure WASB credentials in Databricks

The WASB driver allows you to use either a storage account access key or a Shared Access Signature (SAS). (If you are reading data from a public storage account, you do not need to configure credentials).

Databricks recommends using secrets whenever you need to pass credentials in Databricks. Secrets are available to all users with access to the containing secret scope.

You can pass credentials:

Scoped to the cluster in the Spark config
Scoped to the notebook

Databricks recommends upgrading all your connections to use ABFS to access Azure Blob Storage, which provides similar access patterns as WASB. Use ABFS for the best security and performance when interacting with Azure Blob Storage.

To configure cluster credentials, set Spark configuration properties when you create the cluster. Credentials set at the cluster level are available to all users with access to that cluster.

To configure notebook-scoped credentials, use spark.conf.set(). Credentials passed at the notebook level are available to all users with access to that notebook.

Setting Azure Blob Storage credentials with a storage account access key

A storage account access key grants full access to all containers within a storage account. While this pattern is useful for prototyping, avoid using it in production to reduce risks associated with granting unrestricted access to production data.

spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.blob.core.windows.net",
  "<storage-account-access-key>"
)

You can upgrade account key URIs to use ABFS. For more information, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.

Setting Azure Blob Storage credentials with a Shared Access Signature (SAS)

You can use SAS tokens to configure limited access to a single container in a storage account that expires at a specific time.

spark.conf.set(
  "fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net",
  "<sas-token-for-container>"
)

Access Azure Blob Storage using the DataFrame API

The Apache Spark DataFrame API can use credentials configured at either the notebook or cluster level. All WASB driver URIs specify the container and storage account names. The directory name is optional, and can specify multiple nested directories relative to the container.

wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>

The following code examples show how you can use the DataFrames API and Databricks Utilities (dbutils) reference to interact with a named directory within a container.

df = spark.read.format("parquet").load("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")

dbutils.fs.ls("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")

To update ABFS instead of WASB, update your URIs. For more information, see Access Azure storage

Access Azure Blob Storage with SQL

Credentials set in a notebook’s session configuration are not accessible to notebooks running Spark SQL.

After an account access key or a SAS is set up in your cluster configuration, you can use standard Spark SQL queries with Azure Blob Storage:

-- SQL
CREATE DATABASE <db-name>
LOCATION "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/";

To update ABFS instead of WASB, update your URIs; see Access Azure storage