Model training examples

This section includes examples showing how to train machine learning models on Databricks using many popular open-source libraries.

You can also use Mosaic AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.

Machine learning examples

Package

Notebook(s)

Features

scikit-learn

Machine learning tutorial

Unity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow

scikit-learn

End-to-end example

Unity Catalog, classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost

MLlib

MLlib examples

Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer

xgboost

XGBoost examples

Python, PySpark, and Scala, single node workloads and distributed training

Hyperparameter tuning examples

For general information about hyperparameter tuning in Databricks, see Hyperparameter tuning.

Package

Notebook

Features

Optuna

Get started with Optuna

Optuna, distributed Optuna, scikit-learn, MLflow

Hyperopt

Distributed hyperopt

Distributed hyperopt, scikit-learn, MLflow

Hyperopt

Compare models

Use distributed hyperopt to search hyperparameter space for different model types simultaneously

Hyperopt

Distributed training algorithms and hyperopt

Hyperopt, MLlib

Hyperopt

Hyperopt best practices

Best practices for datasets of different sizes