Model training examples
This section includes examples showing how to train machine learning and deep learning models on Databricks using many popular open-source libraries.
You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.
For an example notebook that shows how to train a machine learning model that uses data in Unity Catalog and write predictions back to Unity Catalog, see Train and register machine learning models with Unity Catalog.
Machine learning examples
Package |
Notebook(s) |
Features |
---|---|---|
scikit-learn |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow |
|
scikit-learn |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving |
|
MLlib |
Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer |
|
xgboost |
Python, PySpark, and Scala, single node workloads and distributed training |
Deep learning examples
Also see Best practices for deep learning on Databricks.
Package |
Notebook |
Features |
---|---|---|
TensorFlow Keras |
TensorFlow Keras, TensorBoard, Hyperopt, MLflow |
|
TensorFlow (single node) |
TensorFlow, TensorBoard |
|
PyTorch (single node) |
PyTorch |
For distributed deep learning training, see:
Package |
Notebook |
Features |
---|---|---|
HorovodRunner (TensorFlow Keras) |
TensorFlow Keras single node to distributed training |
|
HorovodRunner (PyTorch) |
PyTorch single node to distributed training |
|
HorovodRunner |
Horovod timeline |
|
|
|
|
|
Distributed training with TensorFlow on Apache Spark clusters |
|
TorchDistributor |
Distributed training with PyTorch on Apache Spark clusters |
Hyperparameter tuning examples
For general information about hyperparameter tuning in Databricks, see Hyperparameter tuning.
Package |
Notebook |
Features |
---|---|---|
Hyperopt |
Distributed hyperopt, scikit-learn, MLflow |
|
Hyperopt |
Use distributed hyperopt to search hyperparameter space for different model types simultaneously |
|
Hyperopt |
Hyperopt, MLlib |
|
Hyperopt |
Best practices for datasets of different sizes |