Databricks Foundation Model APIs
Preview
This feature is in Public Preview and is supported in us-east1
and us-central1
.
This article provides an overview of the Foundation Model APIs in Databricks. It includes requirements for use, supported models, and limitations.
What are Databricks Foundation Model APIs?
Mosaic AI Model Serving now supports Foundation Model APIs which allow you to access and query state-of-the-art open models from a serving endpoint. With Foundation Model APIs, you can quickly and easily build applications that leverage a high-quality generative AI model without maintaining your own model deployment. Foundation Model APIs is a Databricks Designated Service, which means that it uses Databricks Geos to manage data residency when processing customer content.
Foundation Model APIs is available using provisioned throughput mode. This mode is recommended for all production workloads, especially those that require high throughput, performance guarantees, fine-tuned models, or have additional security requirements. Provisioned throughput endpoints are available with compliance certifications like HIPAA.
Using the Foundation Model APIs you can do the following:
Query a generalized LLM to verify a project’s validity before investing more resources.
Query a generalized LLM in order to create a quick proof-of-concept for an LLM-based application before investing in training and deploying a custom model.
Build an LLM application for development or production on top of a scalable, SLA-backed LLM serving solution that can support your production traffic spikes.
Requirements
Databricks API token to authenticate endpoint requests.
Serverless compute.
A workspace in a supported provisioned throughput region.
Use Foundation Model APIs
You have multiple options for using the Foundation Model APIs.
The APIs are compatible with OpenAI, so you can use the OpenAI client for querying. You can also use the UI, the Foundation Models APIs Python SDK, the MLflow Deployments SDK, or the REST API for querying supported models. Databricks recommends using the OpenAI client SDK or API for extended interactions and the UI for trying out the feature.
See Query generative AI models for scoring examples.
Provisioned throughput Foundation Model APIs
Provisioned throughput provides endpoints with optimized inference for foundation model workloads that require performance guarantees. Databricks recommends provisioned throughput for production workloads. See Provisioned throughput Foundation Model APIs for a step-by-step guide on how to deploy Foundation Model APIs in provisioned throughout mode.
Provisioned throughput support includes:
Base models of all sizes. Base models can be accessed using the Databricks Marketplace, or you can alternatively download them from Hugging Face or another external source and register them in the Unity Catalog. The latter approach works with any fine-tuned variant of the supported models, irrespective of the fine-tuning method employed.
Fine-tuned variants of base models, such as models that are fine-tuned on proprietary data.
Fully custom weights and tokenizers, such as those trained from scratch or continued pre-trained or other variations using the base model architecture (such as CodeLlama).
The following table summarizes the supported model architectures for provisioned throughput.
Model architecture |
Task types |
Notes |
---|---|---|
GTE v1.5 (English) |
Embedding |
Does not generate normalized embeddings. |
BGE v1.5 (English) |
Embedding |