Databricks Feature Store

This page explains what a feature store is and what benefits it provides, and the specific advantages of Databricks Feature Store.

The Databricks Feature Store library is available only on Databricks Runtime for Machine Learning and is accessible through Databricks notebooks and workflows.

What is a feature store?

A feature store is a centralized repository that enables data scientists to find and share features and also ensures that the same code used to compute the feature values is used for model training and inference.

Machine learning uses existing data to build a model to predict future outcomes. In almost all cases, the raw data requires preprocessing and transformation before it can be used to build a model. This process is called featurization or feature engineering, and the outputs of this process are called features - the building blocks of the model.

Developing features is complex and time-consuming. An additional complication is that for machine learning, the featurization calculations need to be done for model training, and then again when the model is used to make predictions. These implementations may not be done by the same team or using the same code environment, which can lead to delays and errors. Also, different teams in an organization will often have similar feature needs but may not be aware of work that other teams have done. A feature store is designed to address these problems.

Why use Databricks Feature Store?

Databricks Feature Store is fully integrated with other components of Databricks.

  • Discoverability. The Feature Store UI, accessible from the Databricks workspace, lets you browse and search for existing features.

  • Lineage. When you create a feature table with Feature Store, the data sources used to create the feature table are saved and accessible. For each feature in a feature table, you can also access the models, notebooks, jobs, and endpoints that use the feature.

  • Integration with model scoring and serving. When you use features from Feature Store to train a model, the model is packaged with feature metadata. When you use the model for batch scoring or online inference, it automatically retrieves features from Feature Store. The caller does not need to know about them or include logic to look up or join features to score new data. This makes model deployment and updates much easier.

  • Point-in-time lookups. Feature Store supports time series and event-based use cases that require point-in-time correctness.

Start using Feature Store

See the following articles to get started with Feature Store:

More information

For more information on best practices for using Feature Store, download The Comprehensive Guide to Feature Stores.