Databricks Runtime for Machine Learning (Databricks Runtime ML) automates the creation of a cluster optimized for machine learning. Databricks Runtime ML clusters include the most popular machine learning libraries, such as TensorFlow, PyTorch, Keras, and XGBoost, and also include libraries required for distributed training such as Horovod. Using Databricks Runtime ML speeds up cluster creation and ensures that the installed library versions are compatible.
For complete information about using Databricks for machine learning and deep learning, see Databricks Machine Learning guide.
For information about the contents of each Databricks Runtime ML version, see the release notes.
Databricks Runtime ML is built on Databricks Runtime. For example, Databricks Runtime 7.3 LTS for Machine Learning is built on Databricks Runtime 7.3 LTS. The libraries included in the base Databricks Runtime are listed in the Databricks Runtime release notes.
This tutorial is designed for new users of Databricks Runtime ML. It takes about 10 minutes to work through, and shows a complete end-to-end example of loading tabular data, training a model, distributed hyperparameter tuning, and model inference. It also illustrates how to use the MLflow API and MLflow Model Registry.
The following notebook may include functionality that is not available in this release of Databricks on Google Cloud.
The Databricks Runtime ML includes a variety of popular ML libraries. The libraries are updated with each release to include new features and fixes.
Databricks has designated a subset of the supported libraries as top-tier libraries. For these libraries, Databricks provides a faster update cadence, updating to the latest package releases with each runtime release (barring dependency conflicts). Databricks also provides advanced support, testing, and embedded optimizations for top-tier libraries.
For a full list of top-tier and other provided libraries, see the following articles for each available runtime:
- Databricks Runtime 10.2 for Machine Learning (Beta)
- Databricks Runtime 10.1 for Machine Learning
- Databricks Runtime 10.0 for Machine Learning
- Databricks Runtime 9.1 LTS for Machine Learning
- Databricks Runtime 9.0 for Machine Learning
- Databricks Runtime 8.4 for Machine Learning
- Databricks Runtime 8.3 for Machine Learning
- Databricks Runtime 8.2 for Machine Learning (Unsupported)
- Databricks Runtime 8.1 for Machine Learning (Unsupported)
- Databricks Runtime 8.0 for Machine Learning (Unsupported)
In addition to the pre-installed libraries, Databricks Runtime ML differs from Databricks Runtime in the cluster configuration and in how you manage Python packages.
When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. Both CPU and GPU-enabled ML runtimes are available.
If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. Incompatible instance types are grayed out in the drop-downs. GPU-enabled instance types are listed under the GPU-Accelerated label.
Libraries in your workspace that automatically install into all clusters can conflict with the libraries included in Databricks Runtime ML. Before you create a cluster with Databricks Runtime ML, clear the Install automatically on all clusters checkbox for conflicting libraries.
In Databricks Runtime 9.0 ML and above, the virtualenv package manager is used to install Python packages. All Python packages are installed inside a single environment:
In Databricks Runtime 8.4 ML and below, the Conda package manager is used to install Python packages. All Python packages are installed inside a single environment:
/databricks/python2 on clusters using Python 2 and
/databricks/python3 on clusters using Python 3. Switching (or activating) Conda environments is not supported.
For information on managing Python libraries, see Libraries.
Databricks Runtime ML includes tools to automate the model development process and help you efficiently find the best performing model.
- AutoML automatically creates, tunes, and evaluates a set of models and creates a Python notebook with the source code for each run so you can review, reproduce, and modify the code.
- Managed MLFlow manages the end-to-end model lifecycle, including tracking experimental runs, deploying and sharing models, and maintaining a centralized model registry.
- Hyperopt, augmented with the
SparkTrialsclass, automates and distributes ML model parameter tuning.