Tutorials: Implement ETL workflows with Delta Live Tables
Delta Live Tables provides a simple declarative approach to build ETL and machine learning pipelines on batch or streaming data, while automating operational complexities such as infrastructure management, task orchestration, error handling and recovery, and performance optimization. You can use the following tutorials to get started with Delta Live Tables, perform common data transformation tasks, and implement more advanced data processing workflows.
Create your first pipeline with Delta Live Tables
To help you learn about the features of the Delta Live Tables framework and how to implement pipelines, this tutorial walks you through creating and running your first pipeline. The tutorial includes an end-to-end example of a pipeline that ingests data, cleans and prepares the data, and performs transformations on the prepared data. See Tutorial: Run your first Delta Live Tables pipeline.
Programmatically create multiple tables with Python
Note
Patterns shown in this article cannot be easily completed with only SQL. Because Python datasets can be defined against any query that returns a DataFrame, you can use spark.sql()
as necessary to use SQL syntax within Python functions.
You can use Python user-defined functions (UDFs) in your SQL queries, but you must define these UDFs in Python files in the same pipeline before calling them in SQL source files. See User-defined scalar functions - Python.
Many workflows require the implementation of multiple data processing flows or dataset definitions that are identical or differ by only a few parameters. This redundancy can result in pipelines that are error-prone and difficult to maintain. To address this redundancy, you can use a metaprogramming pattern with Python. For an example demonstrating how to use this pattern to call a function invoked multiple times to create different tables, see Programmatically create multiple tables.