Get started with machine learning in Databricks


The managed MLflow integration with Databricks on Google Cloud requires Introduction to Databricks Runtime for Machine Learning 9.1 LTS or above.

This notebook provides a quick overview of machine learning model training on Databricks. To train models, you can use libraries like scikit-learn that are preinstalled in Databricks Runtime ML. In addition, you can use MLflow to track the trained models, and Hyperopt with SparkTrials to scale hyperparameter tuning.

In this tutorial, you train a simple classification model using MLflow to track model development and Hyperopt to improve the model’s performance. For more details on productionizing machine learning on Databricks including model lifecycle management and model inference, see the ML end-to-end example.

For additional example notebooks to get started quickly on Databricks, see Tutorials: Get started with ML.


Databricks Runtime 7.5 ML or above.


If you do not have access to Databricks Runtime 7.5 ML or above, try Get started with scikit-learn in Databricks or Tutorial: End-to-end ML models on Databricks.

Example notebook

Machine learning quickstart notebook

Open notebook in new tab