Build and serve a wide-and-deep model in a recommender system

Building a machine learning pipeline of a recommender system usually involves the following stages:

Workflow

This reference solution covers the stages shown in blue. (See Which topics are not covered?)

Databricks tools highlights

The notebook covers several tools provided on Databricks that simplify building a machine learning pipeline:

  1. SparkDatasetConverter
  2. MLflow model registry
  3. MLflow model serving

Data

The data used in this notebook consists of the following Delta tables:

  • user_profile: contains the user_id values and their static profiles
  • item_profile: contains the item_id values and their static profiles
  • user_item_interaction: contains events where a user interacts with an item. This table is randomly split into three Delta tables to build and evaluate the model: train, validation, and test.

This data format is common for recommendation problems. Some examples are:

  • For ad recommenders, the items are ads and the user-item interactions are records of users clicking the ads.
  • For online shopping recommenders, the items are products and the user-item interactions are records of users reviewing or order history.

When you adapt this notebook to your dataset, you only need to save your data in the Delta tables and provide the table names and locations. The code for loading data can mostly be reused.

See the dataset generation notebook for details.

Generate and save the dataset notebook

Open notebook in new tab

Model

This notebook uses the wide-and-deep model (paper | tensorflow implementation). This is a popular model that combines a wide linear model with a deep neural network to handle both memorization and generalization.

This model is just one example among many deep learning models for the recommender problem or for any machine learning pipelines in general. The focus here is showing how to build the workflow. You can swap in different models for your own use case and tune the model for better evaluation metrics.

Note

This notebook uses DBFS access to the local filesystem (FUSE mount) and is not supported on Databricks on Google Cloud as of this release.

Build and serve a wide-and-deep model in a recommender system notebook

Open notebook in new tab

Which topics are not covered?

To keep the notebook focused on showing how to implement a recommender system, the following stages are not covered. These stages are shown as gray blocks in the workflow diagram.

  1. Data collection and exploratory data analysis. See Data guide.
  2. Feature engineering. Feature engineering is an important part of a recommender system, and much information is available on this topic. This notebook assumes that you have a curated dataset containing user-item interactions. For details about the dataset used in this notebook, see Data. For more information about feature engineering, see the following resources:
  3. Model tuning. Model tuning involves revising the code of the existing pipeline, including feature engineering, model structure, model hyperparameters, or even updating the data collection stage, to improve the model’s performance. For more information about tools for model tuning on Databricks, see Hyperparameter tuning and automated machine learning.