This page explains what a feature store is and what benefits it provides, and the specific advantages of Databricks Feature Store.
The Databricks Feature Store library is available only on Databricks Runtime for Machine Learning and is accessible through Databricks notebooks and workflows.
A feature store is a centralized repository that enables data scientists to find and share features and also ensures that the same code used to compute the feature values is used for model training and inference.
Machine learning uses existing data to build a model to predict future outcomes. In almost all cases, the raw data requires preprocessing and transformation before it can be used to build a model. This process is called featurization or feature engineering, and the outputs of this process are called features - the building blocks of the model.
Developing features is complex and time-consuming. An additional complication is that for machine learning, the featurization calculations need to be done for model training, and then again when the model is used to make predictions. These implementations may not be done by the same team or using the same code environment, which can lead to delays and errors. Also, different teams in an organization will often have similar feature needs but may not be aware of work that other teams have done. A feature store is designed to address these problems.
Databricks Feature Store is fully integrated with other components of Databricks.
Discoverability. The Feature Store UI, accessible from the Databricks workspace, lets you browse and search for existing features.
Lineage. When you create a feature table with Feature Store, the data sources used to create the feature table are saved and accessible. For each feature in a feature table, you can also access the models, notebooks, jobs, and endpoints that use the feature.
Integration with model scoring and serving. When you use features from Feature Store to train a model, the model is packaged with feature metadata. When you use the model for batch scoring or online inference, it automatically retrieves features from Feature Store. The caller does not need to know about them or include logic to look up or join features to score new data. This makes model deployment and updates much easier.
Point-in-time lookups. Feature Store supports time series and event-based use cases that require point-in-time correctness.
See the following articles to get started with Feature Store:
See an example notebook that illustrates the process of creating features, updating them, and using them for model training and batch inference.
See the reference material for the Feature Store Python API.
Learn about training models with Feature Store.
Learn about working with feature tables.
Use time series feature tables and point-in-time lookups to retrieve the latest feature values as of a particular time for training or scoring a model.
For more information on best practices for using Feature Store, download The Comprehensive Guide to Feature Stores.