Get started with Databricks as a machine learning engineer
The quickstarts and tutorials listed here are designed to get you started quickly with machine learning on Databricks. Each includes a notebook that you can import and run in your own Databricks workspace. They illustrate how to use Databricks throughout the machine learning lifecycle, including data loading and preparation; model training, tuning, and inference; and model deployment and management. They demonstrate helpful tools such as Hyperopt for automated hyperparameter tuning, MLflow tracking and autologging for model development, and Model Registry for model management.
Note
To run a notebook included in any of these tutorials, click above the notebook on the tutorial page. In your Databricks workspace browser, select Import from any folder menu and paste the URL. To run a notebook, you must have a cluster to run it on. For more information about creating clusters and running notebooks, see Get started with Databricks as a data scientist.
Note
The managed MLflow integration with Databricks on Google Cloud requires Databricks Runtime for Machine Learning 8.1 or above.
For users new to Databricks
The best place to start as a user new to Databricks Machine Learning is to:
Follow the Get started with Databricks as a data scientist quickstart.
Run the in-product quickstart notebook included in the Databricks Machine Learning environment.
This notebook illustrates many of the benefits of using Databricks for machine learning, including tracking model development with MLflow and parallelizing hyperparameter tuning runs. The notebook walks you through how to load data, train and tune a model, compare and analyze model performance, and use the model for inference.
To run the in-product quickstart notebook:
Log in to your Databricks workspace and go to the Databricks Machine Learning persona-based environment.
To change the persona, click the icon below the Databricks logo
, and select Machine Learning.
On the Databricks Machine Learning start page, click Start guide at the upper right.
scikit-learn tutorials
Notebook |
Requirements |
Features |
---|---|---|
Databricks Runtime ML |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow |
|
Databricks Runtime ML |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, Model Registry |
|
Databricks Runtime ML |
Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry |
Apache Spark MLlib tutorial
Notebook |
Requirements |
Features |
---|---|---|
Databricks Runtime ML |
Logistic regression model, Spark pipeline, automated hyperparameter tuning using MLlib API |