Create, run, and manage Delta Live Tables pipelines

Preview

This feature is in Public Preview.

You can create, run, manage, and monitor a Delta Live Tables pipeline using the UI or the Delta Live Tables API. You can also run your pipeline with an orchestration tool such as Databricks jobs. This article focuses on performing Delta Live Tables tasks using the UI. To use the API, see the API guide, or automate the API with the Databricks Terraform provider and databricks_pipeline.

To create and run your first pipeline, see the Delta Live Tables quickstart.

Create a pipeline

  1. Do one of the following:

    • Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Delta Live Tables Create Icon. The Create Pipeline dialog appears.

    • In the sidebar, click Create Icon Create and select Pipeline from the menu.

  2. Select the Delta Live Tables product edition for the pipeline from the Product Edition drop-down.

    The product edition option allows you to choose the best product edition based on the requirements of your pipeline. See Product editions.

  3. Enter a name for the pipeline in the Pipeline Name field.

  4. Enter a path to a notebook containing your pipeline queries in the Notebook Libraries field, or click File Picker Icon to browse to your notebook.

  5. To optionally add additional notebooks to the pipeline, click the Add notebook library button.

    You can add notebooks in any order. Delta Live Tables automatically analyzes dataset dependencies to construct the processing graph for your pipeline.

  6. To optionally add Spark configuration settings to the cluster that will run the pipeline, click the Add configuration button.

  7. To optionally make your tables available for discovery and querying, enter a database name in the Target field. See Publish datasets

  8. To optionally enter a storage location for output data from the pipeline, enter a DBFS or cloud storage path in the Storage Location field. The system uses a default location if you leave Storage Location empty.

  9. Select Triggered or Continuous for Pipeline Mode. See Continuous and triggered pipelines.

  10. You can optionally modify the configuration for pipeline clusters, including enabling and disabling autoscaling and setting the number of worker nodes. See Manage cluster size.

  11. To optionally run this pipeline using Photon runtime, click the Use Photon Acceleration check box.

  12. To optionally change the Delta Live Tables runtime version for this pipeline, click the Channel drop-down. See the channel field in the Delta Live Tables settings.

  13. Click Create.

To optionally view and edit the JSON configuration for your pipeline, click the JSON button on the Create Pipeline dialog.

Start a pipeline update

To run the pipeline you created, start a pipeline update.

  1. Click Jobs Icon Workflows in the sidebar and click the Delta Live Tables tab. The Pipelines list displays.

  2. Do one of the following:

    • To start a pipeline update immediately, click Right Arrow Icon in the Actions column. The system returns a message confirming that your pipeline is starting.

    • To view more options before starting the pipeline, click the pipeline name. The Pipeline details page displays.

The Pipeline details page provides the following options:

To start an update of your pipeline from the Pipeline details page, click the Delta Live Tables Start Icon button.

You might want to reprocess data that has already been ingested, for example, because you modified your queries based on new requirements or to fix a bug calculating a new column. You can reprocess data that’s already been ingested by instructing the Delta Live Tables system to perform a full refresh from the UI. To perform a full refresh, click Blue Down Caret next to the Start button and click Full refresh all.

After starting an update or a full refresh, the system returns a message confirming your pipeline is starting.

After successfully starting the update, the Delta Live Tables system:

  1. Starts a cluster using a cluster configuration created by the Delta Live Tables system. You can also specify a custom cluster configuration.

  2. Creates any tables that don’t exist and ensures that the schema is correct for any existing tables.

  3. Updates tables with the latest data available.

  4. Shuts down the cluster when the update is complete.

You can track the progress of the update by viewing the event log at the bottom of the Pipeline details page.

View pipeline event log

To view details for a log entry, click the entry. The Pipeline event log details pop-up appears. To view a JSON document containing the log details, click the JSON tab.

To learn how to query the event log, for example, to analyze performance or data quality metrics, see Delta Live Tables event log.

When the pipeline update completes, you can also start an update to refresh only selected tables.

Start a pipeline update for selected tables

You may want to reprocess data for only selected tables in your pipeline. For example, during development, you only change a single table and want to reduce testing time, or a pipeline update fails and you want to refresh only the failed tables.

To start an update that refreshes selected tables only, on the Pipeline details page:

  1. Click Select tables for refresh. The Select tables for refresh dialog appears.

    If you do not see the Select tables for refresh button, make sure the Pipeline details page displays the most recent update, and the update is complete. If a DAG is not displayed for the most recent update, for example, because the update failed, the Select tables for refresh button is not displayed.

  2. To select the tables to refresh, click on each table. The selected tables are highlighted and labeled. To remove a table from the update, click on the table again.

  3. Click Refresh selection.

    Note

    The Refresh selection button displays the number of selected tables in parentheses.

To reprocess data that has already been ingested for the selected tables, click Blue Down Caret next to the Refresh selection button and click Full Refresh selection.

Start a pipeline update for failed tables

If a pipeline update fails because of errors in one or more tables in the pipeline graph, you can start an update of only failed tables and any downstream dependencies.

Note

Excluded tables are not refreshed, even if they depend on a failed table.

To update failed tables, on the Pipeline details page, click Refresh failed tables.

To update only selected failed tables:

  1. Click Button Down next to the Refresh failed tables button and click Select tables for refresh. The Select tables for refresh dialog appears.

  2. To select the tables to refresh, click on each table. The selected tables are highlighted and labeled. To remove a table from the update, click on the table again.

  3. Click Refresh selection.

    Note

    The Refresh selection button displays the number of selected tables in parentheses.

To reprocess data that has already been ingested for the selected tables, click Blue Down Caret next to the Refresh selection button and click Full Refresh selection.

View pipeline details

Pipeline graph

After the pipeline starts successfully, the pipeline graph displays. You can use your mouse to adjust the view or the Delta Live Tables Graph Buttons Icon buttons in the corner of the graph panel.

View pipeline graph

To view tooltips for data quality metrics, hover over the data quality values for a dataset in the pipeline graph.

When running an update that refreshes only selected tables, any tables not part of the refresh are labeled Excluded in the pipeline graph.

Pipeline details

The Pipeline details panel displays information about the pipeline and the current or most recent update of the pipeline, including pipeline and update identifiers, update status, update type, and update runtime.

The Pipeline Details panel also displays information about the pipeline compute cluster, including the compute cost, product edition, Databricks Runtime version, and the channel configured for the pipeline. To open the Spark UI for the cluster in a new tab, click the Spark UI button. To open the cluster logs in a new tab, click the Logs button. To open the cluster metrics in a new tab, click the Metrics button.

The Run as value displays the user that pipeline updates run as. The Run as user is the pipeline owner, and pipeline updates run with this user’s permissions. To change the run as user, click Permissions and change the pipeline owner.

Dataset details

To view details for a dataset, including the dataset schema and data quality metrics, click the dataset in the Graph view. The dataset details displays.

View dataset details

To open the pipeline notebook in a new window, click the Path value.

To close the dataset details view and return to the Pipeline details, click Delta Live Tables Close Dialog Button.

Stop a pipeline update

To stop a pipeline update, click Delta Live Tables Stop Icon.

Schedule a pipeline

You can start a triggered pipeline manually or run the pipeline on a schedule with a Databricks job. You can create and schedule a job with a single pipeline task directly in the Delta Live Tables UI or add a pipeline task to a multi-task workflow in the jobs UI.

To create a single-task job and a schedule for the job in the Delta Live Tables UI:

  1. Click Schedule > Add a schedule. The Schedule button is updated to show the number of existing schedules if the pipeline is included in one or more scheduled jobs, for example, Schedule (5).

  2. Enter a name for the job in the Job name field.

  3. Set the Schedule to Scheduled.

  4. Specify the period, starting time, and time zone.

  5. Configure one or more email addresses to receive alerts on pipeline start, success, or failure.

  6. Click Create.

To create a multi-task workflow with a Databricks job and add a pipeline task:

  1. Create a job in the jobs UI and add your pipeline to the job workflow using a Pipeline task.

  2. Create a schedule for the job in the jobs UI.

After creating the pipeline schedule, you can:

  • View a summary of the schedule in the Delta Live Tables UI, including the schedule name, whether it is paused, the last run time, and the status of the last run. To view the schedule summary, click the Schedule button.

  • Edit the job or the pipeline task.

  • Edit the schedule or pause and resume the schedule. The schedule will also be paused if you selected Manual when creating the schedule.

  • Run the job manually and view details on job runs.

View pipelines

Click Jobs Icon Workflows in the sidebar and click the Delta Live Tables tab. The Pipelines page appears with a list of all defined pipelines, the status of the most recent pipeline updates, the pipeline identifier, and the pipeline creator.

You can filter pipelines in the list by:

  • Pipeline name.

  • A partial text match on one or more pipeline names.

  • Selecting only the pipelines you own.

  • Selecting all pipelines you have permissions to access.

Click the Name column header to sort pipelines by name in ascending order (A -> Z) or descending order (Z -> A).

Pipeline names render as a link when you view the pipelines list, allowing you to right-click on a pipeline name and access context menu options such as opening the pipeline details in a new tab or window.

Edit settings

On the Pipeline details page, click the Settings button to view and modify the pipeline settings. You can add, edit, or remove settings. For example, to make pipeline output available for querying after you’ve created a pipeline:

  1. Click the Settings button. The Edit Pipeline Settings dialog appears.

  2. Enter a database name in the Target field.

  3. Click Save.

To view and edit the JSON specification, click the JSON button.

Configure database name in JSON

See Delta Live Tables settings for more information on configuration settings.

View update history

To view the history and status of pipeline updates, click the Update history drop-down.

Update history drop-down

To view the graph, details, and events for an update, select the update in the drop-down. To return to the latest update, click Show the latest update.

Publish datasets

When creating or editing a pipeline, you can configure the target setting to publish your table definitions to the Databricks metastore and persist the records to Delta tables.

After your update completes, you can view the database and tables, query the data, or use the data in downstream applications.

See Delta Live Tables data publishing.

Manage cluster size

You can manage the cluster resources used by your pipeline. By default, Delta Live Tables automatically scales your pipeline clusters to optimize performance and cost. Databricks recommends cluster autoscaling, but you can optionally disable autoscaling and configure a fixed number of worker nodes for your pipeline clusters when you create or edit a pipeline:

  • When creating a pipeline, disable the Enable autoscaling check box and specify the number of nodes in the Workers field.

  • Modify the settings of an existing pipeline to remove autoscaling. This snippet from the settings for a pipeline shows cluster autoscaling enabled:

    "clusters": [
       {
         "label": "default",
          "autoscale": {
            "min_workers": 1,
            "max_workers": 5
          }
       }
    ]
    

    This snippet from the settings for a pipeline illustrates cluster autoscaling disabled and the number of worker nodes fixed at 5:

    "clusters": [
       {
         "label": "default",
         "num_workers": 5
       }
    ]
    

Delete a pipeline

You can delete a pipeline from the Pipelines list or the Pipeline details page:

  • In the Pipelines list, click Trash in the Actions column.

  • On the Pipeline details page for your pipeline, click the Delete button.

Deleting a pipeline removes the pipeline definition from the Delta Live Tables system and cannot be undone.