This article is an introduction to Databricks Machine Learning. It describes the benefits of using Databricks for common ML tasks and provides links to notebooks, tutorials, and user guides to help you get started.
The diagram shows how the capabilities of Databricks map to the steps of the model development and deployment process.
Databricks Machine Learning provides an integrated machine learning environment that helps you simplify and standardize your ML development processes. With Databricks Machine Learning, you can:
Track training parameters and model performance using experiments with MLflow tracking.
Use the Databricks Feature Store to develop and share features, track upstream and downstream feature lineage, and serve feature values online.
Databricks Machine Learning also includes all of the capabilities of the Databricks workspace, including:
Code management with Databricks Repos.
For machine learning applications, Databricks recommends using a cluster running Databricks Runtime for Machine Learning.
Databricks Machine Learning provides pre-built deep learning infrastructure, including built-in, pre-configured GPU support with drivers and supporting libraries. It also includes the most common deep learning libraries like TensorFlow, PyTorch, and Keras and supporting libraries like Petastorm, Hyperopt, and Horovod.
To get started with deep learning on Databricks, see:
Databricks Runtime for Machine Learning includes libraries like Hugging Face Transformers and LangChain that allow you to integrate existing pre-trained models or other open-source libraries into your workflow. The Databricks MLflow integration makes it easy to use the MLflow tracking service with transformer pipelines, models, and processing components. In addition, you can integrate OpenAI models or solutions from partners like John Snow Labs in your Databricks workflows.
With Databricks, you can customize a LLM on your data for your specific task. With the support of open source tooling, such as Hugging Face and DeepSpeed, you can efficiently take a foundation LLM and start training with your own data to have more accuracy for your domain and workload.