Legacy UniForm IcebergCompatV1

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See Universal Format (UniForm) for Iceberg compatibility with Delta tables.

Preview

This feature is in Public Preview in Databricks Runtime 13.2 and above.

Delta Universal Format (UniForm) allows you to read Delta tables with Iceberg reader clients.

UniForm takes advantage of the fact that both Delta Lake and Iceberg consist of Parquet data files and a metadata layer. UniForm automatically generates Iceberg metadata asynchronously, without rewriting data, so that Iceberg clients can read Delta tables as if they were Iceberg tables. A single copy of the data files serves both formats.

You can configure an external connection to have Unity Catalog act as an Iceberg catalog. See Read using the Unity Catalog Iceberg catalog endpoint.

Note

UniForm metadata generation runs asynchronously on the compute used to write data to Delta tables, which might increase the driver resource usage.

Requirements

To enable UniForm, you must fulfill the following requirements:

Enable Delta UniForm

Important

Enabling Delta UniForm sets the Delta table feature IcebergCompatV1, a write protocol feature. Only clients that support this table feature can write to UniForm-enabled tables. You must use Databricks Runtime 13.2 or above to write to Delta tables with this feature enabled.

You can turn off UniForm by unsetting the delta.universalFormat.enabledFormats table property. You cannot turn off column mapping after it has been enabled, and upgrades to Delta Lake reader and writer protocol versions cannot be undone.

The following table property enables UniForm support for Iceberg. iceberg is the only valid value.

'delta.universalFormat.enabledFormats' = 'iceberg'

You must also enable column mapping and IcebergCompatV1 to use UniForm. These are set automatically if you enable UniForm during table creation, as in the following example:

CREATE TABLE T(c1 INT) TBLPROPERTIES(
  'delta.universalFormat.enabledFormats' = 'iceberg');

If you create a new table with a CTAS statement, you must manually specify column mapping, as in the following example:

CREATE TABLE T
TBLPROPERTIES(
  'delta.columnMapping.mode' = 'name',
  'delta.universalFormat.enabledFormats' = 'iceberg')
AS
  SELECT * FROM source_table;

If you are altering an existing table, you must specify all of these properties, as in the following example:

ALTER TABLE T SET TBLPROPERTIES(
  'delta.columnMapping.mode' = 'name',
  'delta.enableIcebergCompatV1' = 'true',
  'delta.universalFormat.enabledFormats' = 'iceberg');

When you first enable UniForm, asynchronous metadata generation begins. This task must complete before external clients can query the table using Iceberg. See Check Iceberg metadata generation status.

Note

If you plan to use BigQuery as your Iceberg reader client, you must set spark.databricks.delta.write.dataFilesToSubdir to true on Databricks to accommodate a BigQuery requirement for data layout.

See Limitations.

When does UniForm generate Iceberg metadata?

Databricks triggers Iceberg metadata generation asynchronously after a Delta Lake write transaction completes using the same compute that completed the Delta transaction. You can also manually trigger Iceberg metadata generation. See Manually trigger Iceberg metadata conversion.

To avoid write latencies associated with Iceberg metadata generation, Delta tables with frequent commits might bundle multiple Delta commits into a single Iceberg commit.

Delta Lake ensures that only one Iceberg metadata generation process is in progress at any time. Commits that would trigger a second concurrent Iceberg metadata generation process will successfully commit to Delta, but they won’t trigger asynchronous Iceberg metadata generation. This prevents cascading latency for metadata generation for workloads with frequent commits (seconds to minutes between commits).

See Delta and Iceberg table versions.

Check Iceberg metadata generation status

UniForm adds the following fields to Unity Catalog and Iceberg table metadata to track metadata generation status:

Metadata field

Description

converted_delta_version

The latest version of the Delta table for which Iceberg metadata was successfully generated.

converted_delta_timestamp

The timestamp of the latest Delta commit for which Iceberg metadata was successfully generated.

On Databricks, you can review these metadata fields using Catalog Explorer. These fields and values are also returned when using the REST API to get a table.

See documentation for your Iceberg reader client for how to review table properties outside Databricks. For OSS Apache Spark, you can see these properties using the following syntax:

SHOW TBLPROPERTIES <table-name>;

Manually trigger Iceberg metadata conversion

You can manually trigger Iceberg metadata generation for the latest version of the Delta table. This operation runs synchronously, meaning that when it completes, the table contents available in Iceberg reflect the latest version of the Delta table available when the conversion process started.

This operation should not be necessary under normal conditions, but can help if you encounter the following:

  • A cluster terminates before automatic metadata generation succeeds.

  • An error or job failure interrupts metadata generation.

  • A client that does not support UniForm Iceberg metadata gneration writes to the Delta table.

Use the following syntax to manually trigger Iceberg metadata generation:

MSCK REPAIR TABLE <table-name> SYNC METADATA

See REPAIR TABLE.

Read using a metadata JSON path

Some Iceberg clients require you provide a path to versioned metadata files to register external Iceberg tables. Each time UniForm converts a new version of the Delta table to Iceberg, it creates a new metadata JSON file.

Clients that use metadata JSON paths for configuring Iceberg include BigQuery. Refer to documentation for the Iceberg reader client for configuration details.

Delta Lake stores Iceberg metadata under the table directory, using the following pattern:

<table-path>/metadata/<version-number>-<uuid>.metadata.json

You can find the path of this file using Catalog Explorer. For tables with UniForm enabled, the details for the Delta table include a field for the Iceberg metadata location.

You can also use the REST API to get all details for a table, including the metadata location. Use the following command:

GET api/2.1/unity-catalog/tables/<catalog-name>.<schame-name>.<table-name>

The response includes the following information:

{
    ...
          "delta_uniform_iceberg": {
              "metadata_location":  "<cloud-storage-uri>/metadata/v<version-number>-<uuid>.metadata.json"
    }
}

Important

Path-based Iceberg reader clients might require manually updating and refreshing metadata JSON paths to read current table versions. Users might encounter errors when querying Iceberg tables using out-of-date versions as Parquet data files are removed from the Delta table with VACUUM.

Read using the Unity Catalog Iceberg catalog endpoint

Some Iceberg clients can connect to an Iceberg REST catalog. Unity Catalog provides a read-only implementation of the Iceberg REST catalog API for Delta tables with UniForm enabled using the endpoint /api/2.1/unity-catalog/iceberg. See the Iceberg REST API spec for details on using this REST API.

Clients known to support the Iceberg catalog API include Apache Spark, Flink, and Trino. You must configure access to the underlying cloud object storage containing the Delta table with UniForm enabled. Refer to documentation for the Iceberg reader client for configuration details.

You must generate and configure a Databricks personal access token to allow other services to connect to Unity Catalog. See Authentication for Databricks automation - overview.

The following is an example of the settings to configure OSS Apache Spark to read UniForm as Iceberg:

"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
"spark.sql.catalog.unity"="org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.unity.catalog-impl": "org.apache.iceberg.rest.RESTCatalog",
"spark.sql.catalog.unity.uri": "<api-root>/api/2.1/unity-catalog/iceberg",
"spark.sql.catalog.unity.token":"<your_personal_access_token>",
"spark.sql.catalog.unity.io-impl": "org.apache.iceberg.aws.s3.S3FileIO

Substitute the full URL of the workspace in which you generated the personal access token for <api-root>.

Note

When querying tables in Unity Catalog using this method, object identifiers use the following pattern:

unity.<catalog-name>.<schema-name>.<table-name>

This pattern uses the same three-tier namespacing present in Unity Catalog, but adds an additional prefix unity.

Delta and Iceberg table versions

Both Delta Lake and Iceberg allow time travel queries using table versions or timestamps stored in table metadata.

In general, Iceberg and Delta table versions do not align by either the commit timestamp or the version ID. If you wish to verify which version of a Delta table a given version of an Iceberg table corresponds to, you can use the corresponding table properties set on the Iceberg table. See Check Iceberg metadata generation status.

Limitations

The following limitations exist:

  • UniForm does not work on tables with deletion vectors enabled. See What are deletion vectors?.

  • Delta tables with UniForm enabled do not support LIST, MAP, and VOID types.

  • Iceberg clients can only read from UniForm. Writes are not supported.

  • Iceberg reader clients might have individual limitations, regardless of UniForm. See documentation for your chosen client.

  • Iceberg reader clients version 1.2.0 and below do not support INT96 timestamp type written by Apache Spark. Use the following code in notebooks that write to UniForm tables to avoid this limitation: spark.conf.set(“spark.sql.parquet.outputTimestampType”, “TIMESTAMP_MICROS”)

  • The public preview version of the Unity Catalog Iceberg endpoint is not meant for large-scale production workloads. You might experience rate-limiting if you exceed a threshold of 5 queries per second.

The following Delta Lake features work for Delta clients when UniForm is enabled, but do not have support in Iceberg:

  • Change Data Feed

  • Delta Sharing