Feature Store concepts

This section describes concepts to help you use Databricks Feature Store and feature tables.

Feature table

Features are organized as feature tables. Each table is backed by a Delta table and additional metadata.

A feature table must have a primary key. Features in a feature table are typically computed and updated using a common computation function.

Feature table metadata tracks the data sources from which a table was generated and the notebooks and jobs that created or wrote to the table.

Offline store

The offline feature store is used for feature discovery, model training, and batch inference. It contains feature tables materialized as Delta tables.

Streaming

In addition to batch writes, Databricks Feature Store supports streaming. You can write feature values to a feature table from a streaming source, and feature computation code can utilize Structured Streaming to transform raw data streams into features.

Training set

A training set consists of a list of features and a DataFrame containing raw training data, labels, and primary keys by which to look up features. You create the training set by specifying features to extract from Feature Store, and provide the training set as input during model training.

See Create a training dataset for an example of how to create and use a training set.

Model packaging

A machine learning model trained using features from Databricks Feature Store retains references to these features. At inference time, the model can optionally retrieve feature values from Feature Store. The caller only needs to provide the primary key of the features used in the model (for example, user_id), and the model retrieves all required feature values from Feature Store.

In batch inference, feature values are retrieved from the offline store and joined with new data prior to scoring. Real-time inference is not supported.

To package a model with feature metadata, use FeatureStoreClient.log_model().