Forecasting (serverless) with AutoML

Preview

This article shows you how to run a serverless forecasting experiment using the Mosaic AI Model Training UI.

Mosaic AI Model Training - forecasting simplifies forecasting time-series data by automatically selecting the best algorithm and hyperparameters, all while running on fully-managed compute resources.

To understand the difference between serverless forecasting and classic compute forecasting, see Serverless forecasting vs. classic compute forecasting.

Requirements

Training data with a time series column, saved as a Unity Catalog table.

Create a forecasting experiment with the UI

Go to your Databricks landing page and click Experiments in the sidebar.

In the Forecasting tile, select Start training.
Select the Training data from a list of Unity Catalog tables that you can access.
- Time column: Select the column containing the time periods for the time series. The columns must be of type timestamp or date.
- Forecast frequency: Select the time unit that represents your input data’s frequency. For example, minutes, hours, days, months. This determines the granularity of your time series.
- Forecast horizon: Specify how many units of the selected frequency to forecast into the future. Together with the forecast frequency, this defines both the time units and the number of time units to forecast.
Note

To use the Auto-ARIMA algorithm, the time series must have a regular frequency where the interval between any two points must be the same throughout the time series. AutoML handles missing time steps by filling in those values with the previous value.
Select a Prediction target column that you want the model to predict.
Optionally, specify a Unity Catalog table Prediction data path to store the output forecasts.
Select a Model registration Unity Catalog location and name.
Optionally, set Advanced options:
- Experiment name: Provide an MLflow experiment name.
- Time series identifier columns - For multi-series forecasting, select the column(s) that identify the individual time series. Databricks groups the data by these columns as different time series and trains a model for each series independently.
- Primary metric: Choose the primary metric used to evaluate and select the best model.
- Training framework: Choose the frameworks for AutoML to explore.
- Split column: Select the column containing custom data split. Values must be “train” , “validate” , “test”
- Weight column: Specify the column to use for weighting time series. All samples for a given time series must have the same weight. The weight must be in the range [0, 10000].
- Holiday region: Select the holiday region to use as covariates in model training.
- Timeout: Set a maximum duration for the AutoML experiment.

Run the experiment and monitor the results

To start the AutoML experiment, click Start training. From the experiment training page, you can do the following:

Stop the experiment at any time.
Monitor runs.
Navigate to the run page for any run.

View results or use the best model

After training completes, the prediction results are stored in specified Delta table and the best model is registered to Unity Catalog.

From the experiments page, you choose from the following next steps:

Select View predictions to see the forecasting results table.
Select Batch inference notebook to open a auto-generated notebook for batch inferencing using the best model.
Select Create serving endpoint to deploy the best model to a Model Serving endpoint.

Serverless forecasting vs. classic compute forecasting

The following table summarizes the differences between serverless forecasting and forecasting with classic compute

Feature	Serverless forecasting	Classic compute forecasting
Compute infrastructure	Databricks manages compute configuration and automatically optimizes for cost and performance.	User-configured compute
Governance	Models and artifacts registered to Unity Catalog	User-configured workspace file store
Algorithm selection	Statistical models plus the deep learning neural net algorithm DeepAR	Statistical models
Feature store integration	Not supported	Supported
Auto-generated notebooks	Batch inference notebook	Source code for all trials
One-click model serving deployment	Supported	Unsupported
Custom train/validate/test splits	Supported	Not supported
Custom weights for individual time series	Supported	Not supported