What are deletion vectors?

Deletion vectors are a storage optimization feature that can be enabled on Delta Lake tables. By default, when a single row in a data file is deleted, the entire Parquet file containing the record must be rewritten. With deletion vectors enabled for the table, DELETE, UPDATE, and MERGE operations use deletion vectors to mark existing rows as removed or changed without rewriting the Parquet file. Subsequent reads on the table resolve current table state by applying the deletions noted by deletion vectors to the most recent table version.

Databricks recommends using Databricks Runtime 14.3 LTS and above to write tables with deletion vectors to leverage all optimizations. You can read tables with deletion vectors enabled in Databricks Runtime 12.2 LTS and above.

In Databricks Runtime 14.2 and above, tables with deletion vectors support row-level concurrency. See Write conflicts with row-level concurrency.

Note

Photon leverages deletion vectors for predictive I/O updates, accelerating DELETE, MERGE, and UPDATE operations. All clients that support reading deletion vectors can read updates that produced deletion vectors, regardless of whether these updates were produced by predictive I/O. See Use predictive I/O to accelerate updates.

Enable deletion vectors

Important

A workspace admin setting controls whether deletion vectors are auto-enabled for new Delta tables. See Auto-enable deletion vectors.

You enable support for deletion vectors on a Delta Lake table by setting a Delta Lake table property. You enable deletion vectors during table creation or alter an existing table, as in the following examples:

CREATE TABLE <table-name> [options] TBLPROPERTIES ('delta.enableDeletionVectors' = true);

ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

Warning

When you enable deletion vectors, the table protocol is upgraded. After upgrading, the table will not be readable by Delta Lake clients that do not support deletion vectors. See How does Databricks manage Delta Lake feature compatibility?.

In Databricks Runtime 14.1 and above, you can drop the deletion vectors table feature to enable compatibility with other Delta clients. See Drop Delta table features.

Apply changes to Parquet data files

Deletion vectors indicate changes to rows as soft-deletes that logically modify existing Parquet data files in the Delta Lake table. These changes are applied physically when data files are rewritten, as triggered by one of the following events:

  • An OPTIMIZE command is run on the table.

  • Auto-compaction triggers a rewrite of a data file with a deletion vector.

  • REORG TABLE ... APPLY (PURGE) is run against the table.

Events related to file compaction do not have strict guarantees for resolving changes recorded in deletion vectors, and some changes recorded in deletion vectors might not be applied if target data files would not otherwise be candidates for file compaction. REORG TABLE ... APPLY (PURGE) rewrites all data files containing records with modifications recorded using deletion vectors. See REORG TABLE.

Note

Modified data might still exist in the old files. You can run VACUUM to physically delete the old files. REORG TABLE ... APPLY (PURGE) creates a new version of the table at the time it completes, which is the timestamp you must consider for the retention threshold for your VACUUM operation to fully remove deleted files. See Remove unused data files with vacuum.

Compatibility with Delta clients

Databricks leverages deletion vectors to power predictive I/O for updates on Photon-enabled compute. See Use predictive I/O to accelerate updates.

Support for leveraging deletion vectors for reads and writes varies by client.

The following table denotes required client versions for reading and writing Delta tables with deletion vectors enabled and specifies which write operations leverage deletion vectors:

Client

Write deletion vectors

Read deletion vectors

Databricks Runtime with Photon

Supports MERGE, UPDATE, and DELETE using Databricks Runtime 12.2 LTS and above.

Requires Databricks Runtime 12.2 LTS or above.

Databricks Runtime without Photon

Supports DELETE using Databricks Runtime 12.2 LTS and above. Supports UPDATE using Databricks Runtime 14.1 and above. Supports MERGE using Databricks Runtime 14.3 LTS and above.

Requires Databricks Runtime 12.2 LTS or above.

OSS Apache Spark with OSS Delta Lake

Supports DELETE using OSS Delta 2.4.0 and above. Supports UPDATE using OSS Delta 3.0.0 and above.

Requires OSS Delta 2.3.0 or above.

Delta Sharing recipients

Writes are not supported on Delta Sharing tables

Databricks: Requires DBR 14.1 or above. Open-source Apache Spark: Requires delta-sharing-spark 3.1 or above.

Note

For support in other Delta clients, see the OSS Delta Lake integrations documentation.

Limitations

  • UniForm does not support deletion vectors.

  • You can enable deletion vectors for Materialized views, but to disable deletion vectors for a Materialized view, you must drop the Materialized view and recreate it.

  • You cannot generate a manifest file for a table with deletion vectors present. To generate a manifest, run REORG TABLE ... APPLY (PURGE) and ensure that no concurrent write operations are running.

  • You cannot incrementally generate manifest files for a table with deletion vectors enabled.