Use Delta Live Tables pipelines with legacy Hive metastore
This article details configurations and caveats specific to Delta Live Tables pipelines configured to publish data to the legacy Hive metastore. Databricks recommends using Unity Catalog for all new pipelines. See Use Unity Catalog with your Delta Live Tables pipelines.
Publish pipeline datasets to the legacy Hive metastore
While optional, you should specify a target to publish tables created by your pipeline anytime you move beyond development and testing for a new pipeline. Publishing a pipeline to a target makes datasets available for querying elsewhere in your Databricks environment.
You can make the output data of your pipeline discoverable and available to query by publishing datasets to the Hive metastore. To publish datasets to the metastore, enter a schema name in the Target field when you create a pipeline. You can also add a target database to an existing pipeline.
All tables and views created in Delta Live Tables are local to the pipeline by default. You must publish tables to a target schema to query or use Delta Live Tables datasets outside the pipeline in which they are declared.
To publish tables from your pipelines to Unity Catalog, see Use Unity Catalog with your Delta Live Tables pipelines.
How to publish Delta Live Tables datasets to the legacy Hive metastore
You can declare a target schema for all tables in your Delta Live Tables pipeline using the Target schema field in the Pipeline settings and Create pipeline UIs.
You can also specify a schema in a JSON configuration by setting the target
value.
You must run an update for the pipeline to publish results to the target schema.
You can use this feature with multiple environment configurations to publish to different schemas based on the environment. For example, you can publish to a dev
schema for development and a prod
schema for production data.
How to query streaming tables and materialized views in the legacy Hive metastore
After an update is complete, you can view the schema and tables, query the data, or use the data in downstream applications.
Once published, Delta Live Tables tables can be queried from any environment with access to the target schema. This includes Databricks SQL, notebooks, and other Delta Live Tables pipelines.
Important
When you create a target
configuration, only tables and associated metadata are published. Views are not published to the metastore.
Specify a storage location
You can specify a storage location for a pipeline that publishes to the Hive metastore. The primary motivation for specifying a location is to control the object storage location for data written by your pipeline.
Because all tables, data, checkpoints, and metadata for Delta Live Tables pipelines are fully managed by Delta Live Tables, most interaction with Delta Live Tables datasets happens through tables registered to the Hive metastore or Unity Catalog.
Cloud storage configuration
To access a bucket in Google Cloud Storage (GCS), you must create a service account with access to that GCS bucket and add that service account to the cluster configurations. For more information about creating a Google Cloud service account, see Connect to Google Cloud Storage. You can add the service account configuration when you create or edit a pipeline with the Delta Live Tables API or in the Delta Live Tables UI:
On the Pipeline details page for your pipeline, click the Settings button. The Pipeline settings page appears.
Click the JSON button.
Enter the service account configuration in the
gcp_attributes.google_service_account
field in the cluster configuration:
{
"clusters": [
{
"gcp_attributes": {
"google_service_account": "test-gcs-doc@databricks-dev.iam.gserviceaccount.com"
}
}
]
}
Example pipeline source code notebooks for workspaces without Unity Catalog
You can import the following notebooks into a Databricks workspace without Unity Catalog enabled and use them to deploy a Delta Live Tables pipeline. Import the notebook of your chosen language and specify the path in Source code field when configuring a pipeline with the Hive metastore storage option. See Configure a Delta Live Tables pipeline.