Model training examples
Note
The managed MLflow integration with Databricks on Google Cloud requires Introduction to Databricks Runtime for Machine Learning 9.1 LTS or above.
This section includes examples showing how to train machine learning and deep learning models on Databricks using many popular open-source libraries.
You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.
For an example notebook that shows how to train a machine learning model that uses data in Unity Catalog and write predictions back to Unity Catalog, see Python ML model training with Unity Catalog data.
Machine learning examples
Package |
Notebook(s) |
Features |
---|---|---|
scikit-learn |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow |
|
scikit-learn |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, Model Registry |
|
scikit-learn |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving |
|
MLlib |
Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer |
|
xgboost |
Python, PySpark, and Scala, single node workloads and distributed training |
Deep learning examples
Also see Best practices for deep learning on Databricks.
Package |
Notebook |
Features |
---|---|---|
TensorFlow Keras |
TensorFlow Keras, TensorBoard, Hyperopt, MLflow |
|
TensorFlow (single node) |
TensorFlow, TensorBoard |
|
PyTorch (single node) |
PyTorch |
For distributed deep learning training, see:
Package |
Notebook |
Features |
---|---|---|
HorovodRunner (TensorFlow Keras) |
TensorFlow Keras single node to distributed training |
|
HorovodRunner (PyTorch) |
PyTorch single node to distributed training |
|
HorovodRunner |
Horovod timeline |
|
|
|
|
|
Distributed training with TensorFlow on Apache Spark clusters |
Hyperparameter tuning examples
For general information about hyperparameter tuning in Databricks, see Hyperparameter tuning.
Package |
Notebook |
Features |
---|---|---|
Hyperopt |
Distributed hyperopt, scikit-learn, MLflow |
|
Hyperopt |
Use distributed hyperopt to search hyperparameter space for different model types simultaneously |
|
Hyperopt |
Hyperopt, MLlib |
|
Hyperopt |
Best practices for datasets of different sizes |