Use XGBoost on Databricks
Learn how to train machine learning models using XGBoost in Databricks. Databricks Runtime for Machine Learning includes XGBoost libraries for both Python and Scala.
Train XGBoost models on a single node
You can train models using the Python xgboost
package. This package supports only single node workloads. To train a PySpark ML pipeline and take advantage of distributed training, see Distributed training of XGBoost models.
Distributed training of XGBoost models
For distributed training of XGBoost models, Databricks includes PySpark estimators based on the xgboost
package. Databricks also includes the Scala package xgboost-4j
. For details and example notebooks, see the following:
Distributed training of XGBoost models using xgboost.spark (Databricks Runtime 12.0 ML and above)
Distributed training of XGBoost models using sparkdl.xgboost (deprecated starting with Databricks Runtime 12.0 ML)