Databricks Runtime for Machine Learning incorporates Hyperopt, an open source tool that automates the process of model selection and hyperparameter tuning.
Databricks Runtime ML includes Hyperopt, a Python library that facilitates distributed hyperparameter tuning and model selection. With Hyperopt, you can scan a set of Python models while varying algorithms and hyperparameters across spaces that you define. Hyperopt works with both distributed ML algorithms such as Apache Spark MLlib and Horovod, as well as with single-machine ML models such as scikit-learn and TensorFlow.
The basic steps when using Hyperopt are:
Define an objective function to minimize. Typically this is the training or validation loss.
Define the hyperparameter search space. Hyperopt provides a conditional search space, which lets you compare different ML algorithms in the same run.
Specify the search algorithm. Hyperopt uses stochastic tuning algorithms that perform a more efficient search of hyperparameter space than a deterministic grid search.
Run the Hyperopt function
fmin()takes the items you defined in the previous steps and identifies the set of hyperparameters that minimizes the objective function.
To get started quickly using Hyperopt with scikit-learn algorithms, see:
For more details about how Hyperopt works, and for additional examples, see:
MLlib automated MLflow tracking is deprecated on clusters that run Databricks Runtime 10.1 ML and above, and it is disabled by default on clusters running Databricks Runtime 10.2 ML and above. Instead, use MLflow PySpark ML autologging by calling
mlflow.pyspark.ml.autolog(), which is enabled by default with Databricks Autologging.
To use the old MLlib automated MLflow tracking in Databricks Runtime 10.2 ML and above, enable it by setting the Spark configurations
spark.databricks.mlflow.trackMLlib.enabled true and