Microsoft has deprecated the Windows Azure Storage Blob driver (WASB) for Azure Blob Storage in favor of the Azure Blob Filesystem driver (ABFS); see Connect to Azure Data Lake Storage Gen2 and Blob Storage. ABFS has numerous benefits over WASB; see Azure documentation on ABFS.
This article provides documentation for maintaining code that uses the WASB driver. Databricks recommends using ABFS for all connections to Azure Blob Storage.
The WASB driver allows you to use either a storage account access key or a Shared Access Signature (SAS). (If you are reading data from a public storage account, you do not need to configure credentials).
You can pass credentials:
Scoped to the cluster in the Spark config
Scoped to the notebook
Databricks recommends upgrading all your connections to use ABFS to access Azure Blob Storage, which provides similar access patterns as WASB. Use ABFS for the best security and performance when interacting with Azure Blob Storage.
To configure cluster credentials, set Spark configuration properties when you create the cluster. Credentials set at the cluster level are available to all users with access to that cluster.
To configure notebook-scoped credentials, use
spark.conf.set(). Credentials passed at the notebook level are available to all users with access to that notebook.
A storage account access key grants full access to all containers within a storage account. While this pattern is useful for prototyping, avoid using it in production to reduce risks associated with granting unrestricted access to production data.
spark.conf.set( "fs.azure.account.key.<storage-account-name>.blob.core.windows.net", "<storage-account-access-key>" )
You can upgrade account key URIs to use ABFS. For more information, see Connect to Azure Data Lake Storage Gen2 and Blob Storage.
The Apache Spark DataFrame API can use credentials configured at either the notebook or cluster level. All WASB driver URIs specify the container and storage account names. The directory name is optional, and can specify multiple nested directories relative to the container.
The following code examples show how you can use the DataFrames API and Databricks Utilities (dbutils) reference to interact with a named directory within a container.
df = spark.read.format("parquet").load("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>") dbutils.fs.ls("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")
To update ABFS instead of WASB, update your URIs. For more information, see Access Azure storage
Credentials set in a notebook’s session configuration are not accessible to notebooks running Spark SQL.
After an account access key or a SAS is set up in your cluster configuration, you can use standard Spark SQL queries with Azure Blob Storage:
-- SQL CREATE DATABASE <db-name> LOCATION "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/";
To update ABFS instead of WASB, update your URIs; see Access Azure storage