This page provides examples of how you can use the
scikit-learn package to train machine learning models in Databricks. scikit-learn is one of the most popular Python libraries for single-node machine learning and is included in Databricks Runtime and Databricks Runtime ML. See Databricks Runtime release notes for the scikit-learn library version included with your cluster’s runtime.
You can import these notebooks and run them in your Databricks workspace.
For additional example notebooks to get started quickly on Databricks, see Tutorials: Get started with ML.
This notebook provides a quick overview of machine learning model training on Databricks. It uses the
scikit-learn package to train a simple classification model. It also illustrates the use of MLflow to track the model development process, and Hyperopt to automate hyperparameter tuning.
This notebook uses scikit-learn to illustrate a complete end-to-end example of loading data, model training, distributed hyperparameter tuning, and model inference. It also illustrates model lifecycle management using MLflow Model Registry to log and register your model.