Forecasting (serverless) with AutoML

Preview

This feature is in Public Preview.

This article shows you how to run a serverless forecasting experiment using the Mosaic AI Model Training UI.

Mosaic AI Model Training - forecasting simplifies forecasting time-series data by automatically selecting the best algorithm and hyperparameters, all while running on fully-managed compute resources.

To understand the difference between serverless forecasting and classic compute forecasting, see Serverless forecasting vs. classic compute forecasting.

Requirements

  • Training data with a time series column, saved as a Unity Catalog table.

Create a forecasting experiment with the UI

Go to your Databricks landing page and click Experiments in the sidebar.

  1. In the Forecasting tile, select Start training.

  2. Select the Training data from a list of Unity Catalog tables that you can access.

    • Time column: Select the column containing the time periods for the time series. The columns must be of type timestamp or date.

    • Forecast frequency: Select the time unit that represents your input data’s frequency. For example, minutes, hours, days, months. This determines the granularity of your time series.

    • Forecast horizon: Specify how many units of the selected frequency to forecast into the future. Together with the forecast frequency, this defines both the time units and the number of time units to forecast.

    Note

    To use the Auto-ARIMA algorithm, the time series must have a regular frequency where the interval between any two points must be the same throughout the time series. AutoML handles missing time steps by filling in those values with the previous value.

  3. Select a Prediction target column that you want the model to predict.

  4. Optionally, specify a Unity Catalog table Prediction data path to store the output forecasts.

    Serverless forecasting UI screenshot.
  5. Select a Model registration Unity Catalog location and name.

  6. Optionally, set Advanced options:

    • Experiment name: Provide an MLflow experiment name.

    • Time series identifier columns - For multi-series forecasting, select the column(s) that identify the individual time series. Databricks groups the data by these columns as different time series and trains a model for each series independently.

    • Primary metric: Choose the primary metric used to evaluate and select the best model.

    • Training framework: Choose the frameworks for AutoML to explore.

    • Split column: Select the column containing custom data split. Values must be “train” , “validate” , “test”

    • Weight column: Specify the column to use for weighting time series. All samples for a given time series must have the same weight. The weight must be in the range [0, 10000].

    • Holiday region: Select the holiday region to use as covariates in model training.

    • Timeout: Set a maximum duration for the AutoML experiment.

Run the experiment and monitor the results

To start the AutoML experiment, click Start training. From the experiment training page, you can do the following:

  • Stop the experiment at any time.

  • Monitor runs.

  • Navigate to the run page for any run.

View results or use the best model

After training completes, the prediction results are stored in specified Delta table and the best model is registered to Unity Catalog.

From the experiments page, you choose from the following next steps:

  • Select View predictions to see the forecasting results table.

  • Select Batch inference notebook to open a auto-generated notebook for batch inferencing using the best model.

  • Select Create serving endpoint to deploy the best model to a Model Serving endpoint.

Serverless forecasting vs. classic compute forecasting

The following table summarizes the differences between serverless forecasting and forecasting with classic compute

Feature

Serverless forecasting

Classic compute forecasting

Compute infrastructure

Databricks manages compute configuration and automatically optimizes for cost and performance.

User-configured compute

Governance

Models and artifacts registered to Unity Catalog

User-configured workspace file store

Algorithm selection

Statistical models plus the deep learning neural net algorithm DeepAR

Statistical models

Feature store integration

Not supported

Supported

Auto-generated notebooks

Batch inference notebook

Source code for all trials

One-click model serving deployment

Supported

Unsupported

Custom train/validate/test splits

Supported

Not supported

Custom weights for individual time series

Supported

Not supported