Tutorial: End-to-end ML models on Databricks

Machine learning in the real world is messy. Data sources contain missing values, include redundant rows, or may not fit in memory. Feature engineering often requires domain expertise and can be tedious. Modeling too often mixes data science and systems engineering, requiring not only knowledge of algorithms but also of machine architecture and distributed systems.

Databricks simplifies this process. The following 10-minute tutorial notebook shows an end-to-end example of training machine learning models on tabular data.

You can import this notebook and run it yourself, or copy code-snippets and ideas for your own use.

Note

The following notebook may include functionality that is not available in this release of Databricks on Google Cloud.

Notebook

If your workspace is enabled for Unity Catalog, use this version of the notebook:

Use scikit-learn with MLflow integration on Databricks (Unity Catalog)

Open notebook in new tab

If your workspace is not enabled for Unity Catalog, use this version of the notebook:

Use scikit-learn with MLflow integration on Databricks

Open notebook in new tab