Universal Format (UniForm) for Iceberg compatibility with Delta tables

Delta Universal Format (UniForm) allows you to read Delta tables with Iceberg reader clients. This feature requires Databricks Runtime 14.3 LTS or above.

Important

For documentation for the legacy UniForm IcebergCompatV1 table feature, see Legacy UniForm IcebergCompatV1.

UniForm takes advantage of the fact that both Delta Lake and Iceberg consist of Parquet data files and a metadata layer. UniForm automatically generates Iceberg metadata asynchronously, without rewriting data, so that Iceberg clients can read Delta tables as if they were Iceberg tables. A single copy of the data files serves both formats.

You can configure an external connection to have Unity Catalog act as an Iceberg catalog. See Read using the Unity Catalog Iceberg catalog endpoint.

UniForm uses zstd instead of snappy as the compression codec for underlying Parquet data files.

Note

UniForm metadata generation runs asynchronously on the compute used to write data to Delta tables, which might increase the driver resource usage.

Requirements

To enable UniForm, you must fulfill the following requirements:

Note

You cannot enable deletion vectors on a table with UniForm enabled. When enabling UniForm on an existing table with deletion vectors enabled, UniForm disables and purges deletion vectors and rewrites data files as necessary.

Enable Delta UniForm

Important

Enabling Delta UniForm sets the Delta table feature IcebergCompatV2, a write protocol feature. Only clients that support this table feature can write to UniForm-enabled tables. You must use Databricks Runtime 14.3 LTS or above to write to Delta tables with this feature enabled.

You can turn off UniForm by unsetting the delta.universalFormat.enabledFormats table property. You cannot turn off column mapping after it has been enabled, and upgrades to Delta Lake reader and writer protocol versions cannot be undone.

You must set the following table properties to enable UniForm support for Iceberg:

'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg'

You must also enable column mapping to use UniForm. This is enabled automatically if you enable UniForm during table creation, as in the following example:

CREATE TABLE T(c1 INT) TBLPROPERTIES(
  'delta.enableIcebergCompatV2' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');

You can enable UniForm on an existing table using the following syntax:

REORG TABLE table_name APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));

Note

This syntax also works to upgrade from the Public Preview version of UniForm, which used the table feature IcebergCompatV1.

This syntax automatically disables and purges deletion vectors from the table. Existing files are rewritten as necessary to make them Iceberg compatible.

When you first enable UniForm, asynchronous metadata generation begins. This task must complete before external clients can query the table using Iceberg. See Check Iceberg metadata generation status.

Note

If you plan to use BigQuery as your Iceberg reader client, you must set spark.databricks.delta.write.dataFilesToSubdir to true on Databricks to accommodate a BigQuery requirement for data layout.

See Limitations.

When does UniForm generate Iceberg metadata?

Databricks triggers Iceberg metadata generation asynchronously after a Delta Lake write transaction completes using the same compute that completed the Delta transaction. You can also manually trigger Iceberg metadata generation. See Manually trigger Iceberg metadata conversion.

To avoid write latencies associated with Iceberg metadata generation, Delta tables with frequent commits might bundle multiple Delta commits into a single Iceberg commit.

Delta Lake ensures that only one Iceberg metadata generation process is in progress at any time. Commits that would trigger a second concurrent Iceberg metadata generation process will successfully commit to Delta, but they won’t trigger asynchronous Iceberg metadata generation. This prevents cascading latency for metadata generation for workloads with frequent commits (seconds to minutes between commits).

See Delta and Iceberg table versions.

Check Iceberg metadata generation status

UniForm adds the following fields to Unity Catalog and Iceberg table metadata to track metadata generation status:

Metadata field

Description

converted_delta_version

The latest version of the Delta table for which Iceberg metadata was successfully generated.

converted_delta_timestamp

The timestamp of the latest Delta commit for which Iceberg metadata was successfully generated.

On Databricks, you can review these metadata fields by doing one of the following:

  • Reviewing the Delta Uniform Iceberg section returned by DESCRIBE EXTENDED table_name.

  • Reviewing table metadata with Catalog Explorer.

  • Using the REST API to get a table.

See documentation for your Iceberg reader client for how to review table properties outside Databricks. For OSS Apache Spark, you can see these properties using the following syntax:

SHOW TBLPROPERTIES <table-name>;

Manually trigger Iceberg metadata conversion

You can manually trigger Iceberg metadata generation for the latest version of the Delta table. This operation runs synchronously, meaning that when it completes, the table contents available in Iceberg reflect the latest version of the Delta table available when the conversion process started.

This operation should not be necessary under normal conditions, but can help if you encounter the following:

  • A cluster terminates before automatic metadata generation succeeds.

  • An error or job failure interrupts metadata generation.

  • A client that does not support UniForm Iceberg metadata gneration writes to the Delta table.

Use the following syntax to manually trigger Iceberg metadata generation:

MSCK REPAIR TABLE <table-name> SYNC METADATA

See REPAIR TABLE.

Read using a metadata JSON path

Some Iceberg clients require you provide a path to versioned metadata files to register external Iceberg tables. Each time UniForm converts a new version of the Delta table to Iceberg, it creates a new metadata JSON file.

Clients that use metadata JSON paths for configuring Iceberg include BigQuery. Refer to documentation for the Iceberg reader client for configuration details.

Delta Lake stores Iceberg metadata under the table directory, using the following pattern:

<table-path>/metadata/<version-number>-<uuid>.metadata.json

On Databricks, you can review this metadata location by doing one of the following:

  • Reviewing the Delta Uniform Iceberg section returned by DESCRIBE EXTENDED table_name.

  • Reviewing table metadata with Catalog Explorer.

  • Using the following command with the REST API:

GET api/2.1/unity-catalog/tables/<catalog-name>.<schame-name>.<table-name>

The response includes the following information:

{
    ...
          "delta_uniform_iceberg": {
              "metadata_location":  "<cloud-storage-uri>/metadata/v<version-number>-<uuid>.metadata.json"
    }
}

Important

Path-based Iceberg reader clients might require manually updating and refreshing metadata JSON paths to read current table versions. Users might encounter errors when querying Iceberg tables using out-of-date versions as Parquet data files are removed from the Delta table with VACUUM.

Read using the Unity Catalog Iceberg catalog endpoint

Some Iceberg clients can connect to an Iceberg REST catalog. Unity Catalog provides a read-only implementation of the Iceberg REST catalog API for Delta tables with UniForm enabled using the endpoint /api/2.1/unity-catalog/iceberg. See the Iceberg REST API spec for details on using this REST API.

Clients known to support the Iceberg catalog API include Apache Spark, Flink, and Trino. You must configure access to the underlying cloud object storage containing the Delta table with UniForm enabled. Refer to documentation for the Iceberg reader client for configuration details.

You must generate and configure a Databricks personal access token to allow other services to connect to Unity Catalog. See Authentication for Databricks automation - overview.

The following is an example of the settings to configure OSS Apache Spark to read UniForm as Iceberg:

"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.unity"="org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.unity.catalog-impl": "org.apache.iceberg.rest.RESTCatalog",
"spark.sql.catalog.unity.uri": "<api-root>/api/2.1/unity-catalog/iceberg",
"spark.sql.catalog.unity.token":"<your_personal_access_token>",
"spark.sql.catalog.unity.io-impl": "org.apache.iceberg.aws.s3.S3FileIO

Substitute the full URL of the workspace in which you generated the personal access token for <api-root>.

Note

When querying tables in Unity Catalog using this method, object identifiers use the following pattern:

unity.<catalog-name>.<schema-name>.<table-name>

This pattern uses the same three-tier namespacing present in Unity Catalog, but adds an additional prefix unity.

Delta and Iceberg table versions

Both Delta Lake and Iceberg allow time travel queries using table versions or timestamps stored in table metadata.

In general, Iceberg and Delta table versions do not align by either the commit timestamp or the version ID. If you wish to verify which version of a Delta table a given version of an Iceberg table corresponds to, you can use the corresponding table properties set on the Iceberg table. See Check Iceberg metadata generation status.

Limitations

The following limitations exist:

  • UniForm does not work on tables with deletion vectors enabled. See What are deletion vectors?.

  • Delta tables with UniForm enabled do not support VOID types.

  • Iceberg clients can only read from UniForm. Writes are not supported.

  • Iceberg reader clients might have individual limitations, regardless of UniForm. See documentation for your chosen client.

  • The recipients of Delta Sharing can only read the table as Delta, even when UniForm is enabled.

Change Data Feed works for Delta clients when UniForm is enabled, but does not have support in Iceberg.

Some Delta Lake table features used by UniForm are not supported by some Delta Sharing reader clients. See Share data and AI assets securely using Delta Sharing.