Delta Live Tables best practices

This article describes best practices for developing, deploying, and managing Delta Live Tables pipelines.

Choosing between tables and views

To ensure your pipelines are efficient and maintainable, choose the best dataset type, either a table or a view, when you implement your pipeline queries.

Consider using a view when:

  • You have a large or complex query that you want to break into easier-to-manage queries.

  • You want to validate intermediate results using expectations.

  • You want to reduce storage and compute costs and do not require the materialization of query results. Because tables are materialized, they require additional computation and storage resources.

Consider using a table when:

  • Multiple downstream queries consume the table. Because views are computed on demand, the view is re-computed every time the view is queried.

  • The table is consumed by other pipelines, jobs, or queries. Because views are not materialized, you can only use them in the same pipeline.

  • You want to view the results of a query during development. Because tables are materialized and can be viewed and queried outside of the pipeline, using tables during development can help validate the correctness of computations. After validating, convert queries that do not require materialization into views.

Do not override the Spark version in your pipeline configuration

Because Delta Live Tables clusters run on a custom version of Databricks Runtime, you cannot manually set the Spark version in cluster configurations. Manually setting a version may result in pipeline failures.

Choose pipeline boundaries carefully

Consider the following when determining how to break up your pipelines:

Use larger pipelines to:

  • More efficiently use cluster resources.

  • Reduce the number of pipelines in your workspace.

  • Reduce the complexity of workflow orchestration.

Use more than one pipeline to:

  • Split functionality at team boundaries. For example, your data team may maintain pipelines to transform data while your data analysts maintain pipelines that analyze the transformed data.

  • Split functionality at application-specific boundaries to reduce coupling and facilitate the re-use of common functionality.

Use autoscaling to increase efficiency and reduce resource usage

Use Enhanced Autoscaling to optimize the cluster utilization of your pipelines. Enhanced Autoscaling adds additional resources only if the system determines those resources will increase pipeline processing speed. Resources are freed as soon as they are no longer needed, and clusters are shut down as soon as all pipeline updates complete.

Use the following guidelines when configuring Enhanced Autoscaling for production pipelines:

  • Leave the Min workers setting at the default.

  • Set the Max workers setting to a value based on budget and pipeline priority.

  • Leave instance types unset to allow the Delta Live Tables runtime to pick the best instance types for your workload.