Introduction to Databricks Runtime for Machine Learning
Databricks Runtime for Machine Learning (Databricks Runtime ML) provides pre-built machine learning infrastructure that is integrated with all of the capabilities of the Databricks workspace. Each version of Databricks Runtime ML is built on the corresponding version of Databricks Runtime. For example, Databricks Runtime 11.3 LTS for Machine Learning is built on Databricks Runtime 11.3 LTS.
For details about the capabilities of each version of Databricks Runtime ML, including the full list of included libraries, see the release notes.
Why use Databricks Runtime for Machine Learning?
Databricks Runtime ML automates the creation of a cluster optimized for machine learning. Some of the advantages of using Databricks Runtime ML clusters include:
Built-in popular machine learning libraries, such as TensorFlow, PyTorch, Keras, and XGBoost.
Built-in distributed training libraries, such as Horovod.
Compatible versions of installed libraries.
Pre-configured GPU support including drivers and supporting libraries.
Faster cluster creation.
With Databricks, you can use any library to create the logic to train your model. The preconfigured Databricks Runtime ML makes it possible to easily scale common machine learning and deep learning steps.
Databricks Runtime ML also includes all of the capabilities of the Databricks workspace, such as:
Data exploration, management, and governance.
Cluster creation and management.
Library and environment management.
Code management with Databricks Repos.
Automation support including Delta Live Tables, Databricks Jobs, and APIs.
Integrated MLflow for model development tracking, model deployment and serving, and real-time inference.
For complete information about using Databricks for machine learning and deep learning, see Introduction to Databricks Machine Learning.
Tutorial: Databricks Runtime for Machine Learning
This tutorial is designed for new users of Databricks Runtime ML. It takes about 10 minutes to work through, and shows a complete end-to-end example of loading tabular data, training a model, distributed hyperparameter tuning, and model inference. It also illustrates how to use the MLflow API and MLflow Model Registry.
The following notebook may include functionality that is not available in this release of Databricks on Google Cloud.
Libraries included in Databricks Runtime ML
The Databricks Runtime ML includes a variety of popular ML libraries. The libraries are updated with each release to include new features and fixes.
Databricks has designated a subset of the supported libraries as top-tier libraries. For these libraries, Databricks provides a faster update cadence, updating to the latest package releases with each runtime release (barring dependency conflicts). Databricks also provides advanced support, testing, and embedded optimizations for top-tier libraries.
For a full list of top-tier and other provided libraries, see the release notes for each runtime:
For release notes for unsupported Databricks Runtime ML runtimes, see Unsupported releases.
Create a cluster using Databricks Runtime ML
When you create a cluster, select a Databricks Runtime ML version from the Databricks Runtime Version drop-down. Both CPU and GPU-enabled ML runtimes are available.
If you select a cluster from the drop-down menu in the notebook, the Databricks Runtime version appears at the right of the cluster name:
If you select a GPU-enabled ML runtime, you are prompted to select a compatible Driver Type and Worker Type. Incompatible instance types are grayed out in the drop-downs. GPU-enabled instance types are listed under the GPU-Accelerated label.
Libraries in your workspace that automatically install into all clusters can conflict with the libraries included in Databricks Runtime ML. Before you create a cluster with Databricks Runtime ML, clear the Install automatically on all clusters checkbox for conflicting libraries. See the release notes for a list of libraries that are included with each version of Databricks Runtime ML.
Manage Python packages
Databricks Runtime ML differs from Databricks Runtime in how you manage Python packages.
In Databricks Runtime 9.0 ML and above, the virtualenv package manager is used to install Python packages. All Python packages are installed inside a single environment:
In Databricks Runtime 8.4 ML and below, the Conda package manager is used to install Python packages. All Python packages are installed inside a single environment:
/databricks/python2 on clusters using Python 2 and
/databricks/python3 on clusters using Python 3. Switching (or activating) Conda environments is not supported.
For information on managing Python libraries, see Libraries.
Support for automated machine learning
Databricks Runtime ML includes tools to automate the model development process and help you efficiently find the best performing model.
AutoML automatically creates, tunes, and evaluates a set of models and creates a Python notebook with the source code for each run so you can review, reproduce, and modify the code.
Managed MLflow manages the end-to-end model lifecycle, including tracking experimental runs, deploying and sharing models, and maintaining a centralized model registry.
Hyperopt, augmented with the
SparkTrialsclass, automates and distributes ML model parameter tuning.
Databricks Runtime ML is not supported on:
spark.databricks.pyspark.enableProcessIsolation configset to