Deep learning

This article describes what Databricks offers for training and fine-tuning deep learning models. Databricks Runtime for Machine Learning provides pre-built deep learning infrastructure and includes common deep learning libraries like Hugging Face transformers, PyTorch, TensorFlow, and Keras. It also has libraries like Petastorm, Hyperopt, and Horovod to easily scale common machine learning and deep learning steps. And includes pre-configured GPU support including drivers and libraries to accelerate model training and inference.

See Best practices for deep learning on Databricks.

Large language models (LLMs)

Databricks makes it simple to access and build off of publicly available large language models.

Databricks Runtime for Machine Learning includes libraries like Hugging Face Transformers that allow you to integrate existing pre-trained models or other open-source libraries into your workflow. From here, you can leverage Databricks platform capabilities to fine-tune LLMs using your own data for better domain performance.

In addition, Databricks offers built-in functionality for SQL users to access and experiment with LLMs like Azure OpenAI and OpenAI using the preview function ai_generate_text().

Hugging Face Transformers

With Hugging Face Transformers on Databricks you can scale out your natural language processing (NLP) batch applications and fine-tune models for large-language model applications.

The Hugging Face transformers library comes preinstalled on Databricks Runtime 10.4 LTS ML and above. Many of the popular NLP models work best on GPU hardware, so you might get the best performance using recent GPU hardware unless you use a model specifically optimized for use on CPUs.

ai_generate_text()

Preview

This feature is in Public Preview.

ai_generate_text() is a built-in SQL function that allows you to access large language models (LLMs) like Azure OpenAI and OpenAI and experiment with them on your company’s data from within your SQL interface.

This function is only available in public preview on Databricks SQL Pro or Serverless.

PyTorch

PyTorch is included in Databricks Runtime for Machine Learning and provides GPU accelerated tensor computation and high-level functionalities for building deep learning networks. You can perform single node training or distributed training with PyTorch on Databricks. See PyTorch.

Tensorflow

Databricks Runtime for Machine Learning includes TensorFlow and TensorBoard, so you can use these libraries without installing any packages. TensorFlow supports deep-learning and general numerical computations on CPUs, GPUs, and clusters of GPUs. TensorBoard provides visualization tools to help you debug and optimize machine learning and deep learning workflows. See TensorFlow for single node and distributed training examples.

Distributed training

Because deep learning models are data- and computation-intensive, distributed training can be important. For examples of distributed deep learning using integrations with Horovod, spark-tensorflow-distributor, and TorchDistributor, see Distributed training.