External models in Mosaic AI Model Serving

Important

The code examples in this article demonstrate usage of the Public Preview MLflow Deployments CRUD API.

This article describes external models in Mosaic AI Model Serving including its supported model providers and limitations.

What are external models?

Important

You can now configure Mosaic AI Gateway on model serving endpoints that serve external models. AI Gateway brings governance, monitoring, and production readiness to these model serving endpoints. See Mosaic AI Gateway.

External models are third-party models hosted outside of Databricks. Supported by Model Serving, external models allow you to streamline the usage and management of various large language model (LLM) providers, such as OpenAI and Anthropic, within an organization. You can also use Mosaic AI Model Serving as a provider to serve custom models, which offers rate limits for those endpoints. As part of this support, Model Serving offers a high-level interface that simplifies the interaction with these services by providing a unified endpoint to handle specific LLM-related requests.

In addition, Databricks support for external models provides centralized credential management. By storing API keys in one secure location, organizations can enhance their security posture by minimizing the exposure of sensitive API keys throughout the system. It also helps to prevent exposing these keys within code or requiring end users to manage keys safely.

See Tutorial: Create external model endpoints to query OpenAI models for step-by-step guidance on external model endpoint creation and querying supported models served by those endpoints using the MLflow Deployments SDK. See the following guides for instructions on how to use the Serving UI and the REST API:

Requirements

API key or authentication fields for the model provider.
Databricks workspace in External models supported regions.

Model providers

External models in Model Serving is designed to support a variety of model providers. A provider represents the source of the machine learning models, such as OpenAI, Anthropic, and so on. Each provider has its specific characteristics and configurations that are encapsulated within the external_model field of the external model endpoint configuration.

The following providers are supported:

openai: For models offered by OpenAI and the Azure integrations for Azure OpenAI and Azure OpenAI with AAD.
anthropic: For models offered by Anthropic.
cohere: For models offered by Cohere.
amazon-bedrock: For models offered by Amazon Bedrock.
google-cloud-vertex-ai: For models offered by Google Cloud Vertex AI.
databricks-model-serving: For Mosaic AI Model Serving endpoints with compatible schemas. See Endpoint configuration.

To request support for a provider not listed here, reach out to your Databricks account team.

Supported models

The model you choose directly affects the results of the responses you get from the API calls. Therefore, choose a model that fits your use-case requirements. For instance, for generating conversational responses, you can choose a chat model. Conversely, for generating embeddings of text, you can choose an embedding model.

See supported models.

Use models served on Mosaic AI Model Serving endpoints

Mosaic AI Model Serving endpoints as a provider is supported for the llm/v1/completions, llm/v1/chat, and llm/v1/embeddings endpoint types. These endpoints must accept the standard query parameters marked as required, while other parameters might be ignored depending on whether or not the Mosaic AI Model Serving endpoint supports them.

See POST /serving-endpoints/{name}/invocations in the API reference for standard query parameters.

These endpoints must produce responses in the following OpenAI format.

For completions tasks:

{
"id": "123", # Not Required
"model": "test_databricks_model",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

For chat tasks:

{
  "id": "123", # Not Required
  "model": "test_chat_model",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  },
  {
    "index": 1,
    "message": {
      "role": "human",
      "content": "\n\nWhat is the weather in San Francisco?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

For embeddings tasks:

{
  "data": [
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... # (1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    },
    {
      "embedding": [
        0.0023064255,
        -0.009327292,
        .... #(1536 floats total for ada-002)
        -0.0028842222,
      ],
      "index": 0
    }
  ],
  "model": "test_embedding_model",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Endpoint configuration

To serve and query external models you need to configure a serving endpoint. See Create an external model serving endpoint

For an external model serving endpoint, you must include the external_model field and its parameters in the served_entities section of the endpoint configuration. If you configure multiple external models in a serving endpoint, you must provide a traffic_config to define the traffic routing percentage for each external model.

The external_model field defines the model to which this endpoint forwards requests. When specifying a model, it is critical that the provider supports the model you are requesting. For instance, openai as a provider supports models like text-embedding-ada-002, but other providers might not. If the model is not supported by the provider, Databricks returns an HTTP 4xx error when trying to route requests to that model.

The below table summarizes the external_model field parameters. See POST /api/2.0/serving-endpoints for endpoint configuration parameters.

Parameter	Descriptions
`name`	The name of the model to use. For example, `gpt-3.5-turbo` for OpenAI’s `GPT-3.5-Turbo` model.
`provider`	Specifies the name of the provider for this model. This string value must correspond to a supported external model provider. For example, `openai` for OpenAI’s `GPT-3.5` models.
`task`	The task corresponds to the type of language model interaction you desire. Supported tasks are “llm/v1/completions”, “llm/v1/chat”, “llm/v1/embeddings”.
`<provider>_config`	Contains any additional configuration details required for the model. This includes specifying the API base URL and the API key. See Configure the provider for an endpoint.

The following is an example of creating an external model endpoint using the create_endpoint() API. In this example, a request sent to the completion endpoint is forwarded to the claude-2 model provided by anthropic.

import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

client.create_endpoint(
    name="anthropic-completions-endpoint",
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "claude-2",
                    "provider": "anthropic",
                    "task": "llm/v1/completions",
                    "anthropic_config": {
                        "anthropic_api_key": "{{secrets/my_anthropic_secret_scope/anthropic_api_key}}"
                    }
                }
            }
        ]
    }
)

Configure the provider for an endpoint

When you create an endpoint, you must supply the required configurations for the specified model provider. The following sections summarize the available endpoint configuration parameters for each model provider.

Note

Databricks encrypts and securely stores the provided credentials for each model provider. These credentials are automatically deleted when their associated endpoints are deleted.

OpenAI

Configuration Parameter	Description	Required	Default
`openai_api_key`	The Databricks secret key reference for an OpenAI API key using the OpenAI service. If you prefer to paste your API key directly, see `openai_api_key_plaintext`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext`.
`openai_api_key_plaintext`	The OpenAI API key using the OpenAI service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `openai_api_key`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext` must be provided.
`openai_api_type`	An optional field to specify the type of OpenAI API to use.	No	`openai`
`openai_api_base`	The base URL for the OpenAI API.	No	`https://api.openai.com/v1`
`openai_api_version`	An optional field to specify the OpenAI API version.	No
`openai_organization`	An optional field to specify the organization in OpenAI.	No

Cohere

Configuration Parameter	Description	Required
`cohere_api_key`	The Databricks secret key reference for a Cohere API key. If you prefer to paste your API key directly, see `cohere_api_key_plaintext`.	You must provide an API key using one of the following fields: `cohere_api_key` or `cohere_api_key_plaintext`.
`cohere_api_key_plaintext`	The Cohere API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `cohere_api_key`.	You must provide an API key using one of the following fields: `cohere_api_key` or `cohere_api_key_plaintext`.
`cohere_api_base`	The base URL for the Cohere service.	No

Anthropic

Configuration Parameter	Description	Required	Default
`anthropic_api_key`	The Databricks secret key reference for an Anthropic API key. If you prefer to paste your API key directly, see `anthropic_api_key_plaintext`.	You must provide an API key using one of the following fields: `anthropic_api_key` or `anthropic_api_key_plaintext`.
`anthropic_api_key_plaintext`	The Anthropic API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `anthropic_api_key`.	You must provide an API key using one of the following fields: `anthropic_api_key` or `anthropic_api_key_plaintext`.

Azure OpenAI

Azure OpenAI has distinct features as compared with the direct OpenAI service. For an overview, please see the comparison documentation.

Configuration Parameter	Description	Required
`openai_api_key`	The Databricks secret key reference for an OpenAI API key using the Azure service. If you prefer to paste your API key directly, see `openai_api_key_plaintext`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext`.
`openai_api_key_plaintext`	The OpenAI API key using the Azure service provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `openai_api_key`.	You must provide an API key using one of the following fields: `openai_api_key` or `openai_api_key_plaintext`.
`openai_api_type`	Use `azure` for access token validation.	Yes
`openai_api_base`	The base URL for the Azure OpenAI API service provided by Azure.	Yes
`openai_api_version`	The version of the Azure OpenAI service to utilize, specified by a date.	Yes
`openai_deployment_name`	The name of the deployment resource for the Azure OpenAI service.	Yes
`openai_organization`	An optional field to specify the organization in OpenAI.	No

If you are using Azure OpenAI with Microsoft Entra ID, use the following parameters in your endpoint configuration. Databricks passes https://cognitiveservices.azure.com/ as the default scope for the Microsoft Entra ID token.

Configuration Parameter	Description	Required
`microsoft_entra_tenant_id`	The tenant ID for Microsoft Entra ID authentication.	Yes
`microsoft_entra_client_id`	The client ID for Microsoft Entra ID authentication.	Yes
`microsoft_entra_client_secret`	The Databricks secret key reference for a client secret used for Microsoft Entra ID authentication. If you prefer to paste your client secret directly, see `microsoft_entra_client_secret_plaintext`.	You must provide an API key using one of the following fields: `microsoft_entra_client_secret` or `microsoft_entra_client_secret_plaintext`.
`microsoft_entra_client_secret_plaintext`	The client secret used for Microsoft Entra ID authentication provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `microsoft_entra_client_secret`.	You must provide an API key using one of the following fields: `microsoft_entra_client_secret` or `microsoft_entra_client_secret_plaintext`.
`openai_api_type`	Use `azuread` for authentication using Microsoft Entra ID.	Yes
`openai_api_base`	The base URL for the Azure OpenAI API service provided by Azure.	Yes
`openai_api_version`	The version of the Azure OpenAI service to utilize, specified by a date.	Yes
`openai_deployment_name`	The name of the deployment resource for the Azure OpenAI service.	Yes
`openai_organization`	An optional field to specify the organization in OpenAI.	No

The following example demonstrates how to create an endpoint with Azure OpenAI:

client.create_endpoint(
    name="openai-chat-endpoint",
    config={
        "served_entities": [{
            "external_model": {
                "name": "gpt-3.5-turbo",
                "provider": "openai",
                "task": "llm/v1/chat",
                "openai_config": {
                    "openai_api_type": "azure",
                    "openai_api_key": "{{secrets/my_openai_secret_scope/openai_api_key}}",
                    "openai_api_base": "https://my-azure-openai-endpoint.openai.azure.com",
                    "openai_deployment_name": "my-gpt-35-turbo-deployment",
                    "openai_api_version": "2023-05-15"
                }
            }
        }]
    }
)

Google Cloud Vertex AI

Configuration Parameter	Description	Required
`private_key`	The Databricks secret key reference for a private key for the service account which has access to the Google Cloud Vertex AI Service. See Best practices for managing service account keys. If you prefer to paste your API key directly, see `private_key_plaintext`.	You must provide an API key using one of the following fields: `private_key` or `private_key_plaintext`.
`private_key_plaintext`	The private key for the service account which has access to the Google Cloud Vertex AI Service provided as a plaintext secret. See Best practices for managing service account keys. If you prefer to reference your key using Databricks Secrets, see `private_key`.	You must provide an API key using one of the following fields: `private_key` or `private_key_plaintext`.
`region`	This is the region for the Google Cloud Vertex AI Service. See supported regions for more details. Some models are only available in specific regions.	Yes
`project_id`	This is the Google Cloud project id that the service account is associated with.	Yes

Amazon Bedrock

To use Amazon Bedrock as an external model provider, customers need to make sure Bedrock is enabled in the specified AWS region, and the specified AWS key pair have the appropriate permissions to interact with Bedrock services. For more information, see AWS Identity and Access Management.

If there are AWS permission issues, Databricks recommends that you verify the credentials directly with the Amazon Bedrock API.

AI21 Labs

Configuration Parameter	Description	Required	Default
`ai21labs_api_key`	The Databricks secret key reference for an AI21 Labs API key. If you prefer to paste your API key directly, see `ai21labs_api_key_plaintext`.	You must provide an API key using one of the following fields: `ai21labs_api_key` or `ai21labs_api_key_plaintext`.
`ai21labs_api_key_plaintext`	An AI21 Labs API key provided as a plaintext string. If you prefer to reference your key using Databricks Secrets, see `ai21labs_api_key`.	You must provide an API key using one of the following fields: `ai21labs_api_key` or `ai21labs_api_key_plaintext`.

Configure AI Gateway on an endpoint

You can also configure your endpoint to enable Mosaic AI Gateway features, such as rate limiting, usage tracking, and logging.

See Configure AI Gateway on model serving endpoints.

Query an external model endpoint

After you create an external model endpoint, it is ready to receive traffic from users.

You can send scoring requests to the endpoint using the OpenAI client, the REST API or the MLflow Deployments SDK.

See the standard query parameters for a scoring request in POST /serving-endpoints/{name}/invocations.
Query foundation models

The following example queries the claude-2 completions model hosted by Anthropic using the OpenAI client. To use the OpenAI client, populate the model field with the name of the model serving endpoint that hosts the model you want to query.

This example uses a previously created endpoint, anthropic-completions-endpoint, configured for accessing external models from the Anthropic model provider. See how to create external model endpoints.

See Supported models for additional models you can query and their providers.

import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="anthropic-completions-endpoint",
  prompt="what is databricks",
  temperature=1.0
)
print(completion)

Expected output response format:

{
"id": "123", # Not Required
"model": "anthropic-completions-endpoint",
"choices": [
  {
    "text": "Hello World!",
    "index": 0,
    "logprobs": null, # Not Required
    "finish_reason": "length" # Not Required
  }
],
"usage": {
  "prompt_tokens": 8,
  "total_tokens": 8
  }
}

Additional query parameters

You can pass any additional parameters supported by the endpoint’s provider as part of your query.

For example:

logit_bias (supported by OpenAI, Cohere).
top_k (supported by Anthropic, Cohere).
frequency_penalty (supported by OpenAI, Cohere).
presence_penalty (supported by OpenAI, Cohere).
stream (supported by OpenAI, Anthropic, Cohere, Amazon Bedrock for Anthropic). This is only available for chat and completions requests.

Limitations

Depending on the external model you choose, your configuration might cause your data to be processed outside of the region where your data originated. See Model Serving limits and regions.