Model Serving limits and regions
Preview
Mosaic AI Model Serving is in Public Preview and is supported in us-east1
and us-central1
.
This article summarizes the limitations and region availability for Mosaic AI Model Serving and supported endpoint types.
Resource and payload limits
Mosaic AI Model Serving imposes default limits to ensure reliable performance. If you have feedback on these limits, reach out to your Databricks account team.
The following table summarizes resource and payload limitations for model serving endpoints.
Feature |
Granularity |
Limit |
---|---|---|
Payload size |
Per request |
16 MB. For endpoints serving external models the limit is 4 MB. |
Queries per second (QPS) |
Per workspace |
200, but can be increased to 25,000 or more by reaching out to your Databricks account team. |
Model execution duration |
Per request |
120 seconds |
CPU endpoint model memory usage |
Per endpoint |
4GB |
Provisioned concurrency |
Per workspace |
200 concurrency. Can be increased by reaching out to your Databricks account team. |
Overhead latency |
Per request |
Less than 50 milliseconds |
Init scripts |
Init scripts are not supported. |
Networking and security limitations
Model Serving endpoints are protected by access control and respect networking-related ingress rules configured on the workspace.
Model Serving does not provide security patches to existing model images because of the risk of destabilization to production deployments. A new model image created from a new model version will contain the latest patches. Reach out to your Databricks account team for more information.
Foundation Model APIs provisioned throughput limits
The following are limits relevant to Foundation Model APIs provisioned throughput workloads:
Provisioned throughput supports the HIPAA compliance profile and is recommended for workloads that require compliance certifications.
Only the GTE v1.5 (English) model architecture is supported.
Region availability
Note
If you require an endpoint in an unsupported region, reach out to your Databricks account team.
If your workspace is deployed in a region that supports model serving but is served by a control plane in an unsupported region, the workspace does not support model serving. If you attempt to use model serving in such a workspace, you will see in an error message stating that your workspace is not supported. Reach out to your Databricks account team for more information.
For more information on regional availability of features, see Model serving regional availability.