Preprocess data


The managed MLflow integration with Databricks on Google Cloud requires Databricks Runtime for Machine Learning 8.1 or above.

You can use Databricks Feature Store to create new features, explore and re-use existing features, and select features for training and scoring machine learning models.

On large datasets, you can use Spark SQL and MLlib for feature engineering. Third-party libraries included in Databricks Runtime ML such as scikit-learn also provide useful helper methods. For examples, see the following machine learning notebooks for scikit-learn and MLlib:

For more complex deep learning feature processing, this example notebook illustrates how to use transfer learning for featurization: