Databricks Feature Store

Databricks Feature Store is a centralized repository of features. It enables feature sharing and discovery across your organization and also ensures that the same feature computation code is used for model training and inference.

The Databricks Feature Store library is available only on Databricks Runtime for Machine Learning and is accessible through notebooks and jobs.


Databricks Runtime 8.3 ML or above.

Databricks Feature Store

Raw data needs to be processed and transformed before it can be used in machine learning. This process is called feature engineering and includes transformations such as aggregating data (for example, the number of purchases by a user in a given time window) and more complex calculations that may themselves be the result of machine learning algorithms such as word embeddings.

Converting raw data into features for model training is time-consuming. Creating and maintaining feature definition pipelines requires significant effort. Teams often want to explore and leverage features created by other data scientists in the organization.

Another challenge is maintaining consistency between training and serving. A feature pipeline might be created by a data scientist as a prototype and then reimplemented by data engineers for production use. If the model is deployed for low-latency online serving, machine learning engineers might rebuild the same feature computation to optimize for serving. This slows the process of moving models to production and can introduce errors or inconsistencies, sometimes called “skew”, between the code used for training the model and the code used for inference.

These problems can be addressed using a feature store—a centralized repository of features that enables feature sharing and discovery across an organization and also ensures that the same feature computation code is used for model training and inference.

You can use Databricks Feature Store to create new features, explore and re-use existing features, and select features for training and scoring machine learning models.

Databricks Feature Store is fully integrated with other Databricks components. Feature tables are stored as Delta tables. Deployed MLflow models can automatically retrieve features from Feature Store. The Databricks Feature Store UI, accessible from the Databricks workspace, lets you browse and search for existing features and displays information about feature lineage—including data sources used to compute features and the models, notebooks, and jobs that use a specific feature.

In this section: