Manage model lifecycle in Unity Catalog
Preview
This feature is in Public Preview.
This article describes how to use Models in Unity Catalog as part of your machine learning workflow to manage the full lifecycle of ML models. Databricks provides a hosted version of MLflow Model Registry in Unity Catalog. Models in Unity Catalog extends the benefits of Unity Catalog to ML models, including centralized access control, auditing, lineage, and model discovery across workspaces. Models in Unity Catalog is compatible with the open-source MLflow Python client.
Key features of Models in Unity Catalog include:
Namespacing and governance for models, so you can group and govern models at the environment, project, or team level (“Grant data scientists read-only access to production models”).
Chronological model lineage (which MLflow experiment and run produced the model at a given time).
Model versioning.
Model deployment via aliases. For example, mark the “Champion” version of a model within your
prod
catalog.
This article includes instructions for both the Models in Unity Catalog UI and API.
For an overview of Model Registry concepts, see MLflow guide.
Note
This article documents Models in Unity Catalog, which Databricks recommends for governing and deploying models. For documentation of the classic Workspace Model Registry, see Manage model lifecycle using the Workspace Model Registry. For guidance on how to upgrade from the Workspace Model Registry to Unity Catalog, see Migrate workflows and models to Unity Catalog.
Requirements
Unity Catalog must be enabled in your workspace. See Get started using Unity Catalog to create a Unity Catalog Metastore, enable it in a workspace, and create a catalog. If Unity Catalog is not enabled, you can still use the classic workspace model registry.
You must have access to run commands on a cluster with access to Unity Catalog.
To create new registered models, you need the
CREATE_MODEL
privilege on a schema, in addition to theUSE SCHEMA
andUSE CATALOG
privileges on the schema and its enclosing catalog.CREATE_MODEL
is a new schema-level privilege that you can grant using the Catalog Explorer UI or the SQL GRANT command, as shown below.GRANT CREATE_MODEL ON SCHEMA <schema-name> TO <principal>
Upgrade training workloads to Unity Catalog
This section includes instructions to upgrade existing training workloads to Unity Catalog.
Install MLflow Python client
Support for models in Unity Catalog is included in Databricks Runtime 13.2 ML and above. You can also use models in Unity Catalog on Databricks Runtime 11.3 LTS and above by installing the latest version of the MLflow Python client in your notebook, using the code below.
%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()
Configure MLflow client to access models in Unity Catalog
By default, the MLflow Python client creates models in the Databricks workspace model registry. To upgrade to models in Unity Catalog, configure the MLflow client:
import mlflow
mlflow.set_registry_uri("databricks-uc")
Train and register Unity Catalog-compatible models
Permissions required: To create a new registered model, you need the CREATE_MODEL
and USE SCHEMA
privileges on the enclosing schema, and USE CATALOG
privilege on the enclosing catalog. To create new model versions under a registered model, you must be the owner of the registered model and have USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
ML model versions in UC must have a model signature. If you’re not already logging MLflow models with signatures in your model training workloads, you can either:
Use Databricks autologging, which automatically logs models with signatures for many popular ML frameworks. See supported frameworks in the MLflow docs.
With MLflow 2.5.0 and above, you can specify an input example in your
mlflow.<flavor>.log_model
call, and the model signature is automatically inferred. For further information, refer to the MLflow documentation.
Then, pass the three-level name of the model to MLflow APIs, in the form <catalog>.<schema>.<model>
.
The examples in this section create and access models in the ml_team
schema under the prod
catalog.
The model training examples in this section create a new model version and register it in the prod
catalog. Using the prod
catalog doesn’t necessarily mean that the model version serves production traffic. The model version’s enclosing catalog, schema, and registered model reflect its environment (prod
) and associated governance rules (for example, privileges can be set up so that only admins can delete from the prod
catalog), but not its deployment status. To manage the deployment status, use model aliases.
Register a model to Unity Catalog using autologging
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
# Train a sklearn model on the iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
clf = RandomForestClassifier(max_depth=7)
clf.fit(X, y)
# Note that the UC model name follows the pattern
# <catalog_name>.<schema_name>.<model_name>, corresponding to
# the catalog, schema, and registered model name
# in Unity Catalog under which to create the version
# The registered model will be created if it doesn't already exist
autolog_run = mlflow.last_active_run()
model_uri = "runs:/{}/model".format(autolog_run.info.run_id)
mlflow.register_model(model_uri, "prod.ml_team.iris_model")
Register a model to Unity Catalog with automatically inferred signature
Support for automatically inferred signatures is available in MLflow version 2.5.0 and above, and is supported in Databricks Runtime 11.3 LTS ML and above. To use automatically inferred signatures, use the following code to install the latest MLflow Python client in your notebook:
%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()
The following code shows an example of an automatically inferred signature.
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
with mlflow.start_run():
# Train a sklearn model on the iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
clf = RandomForestClassifier(max_depth=7)
clf.fit(X, y)
# Take the first row of the training dataset as the model input example.
input_example = X.iloc[[0]]
# Log the model and register it as a new version in UC.
mlflow.sklearn.log_model(
sk_model=clf,
artifact_path="model",
# The signature is automatically inferred from the input example and its predicted output.
input_example=input_example,
registered_model_name="prod.ml_team.iris_model",
)
View models in the UI
Permissions required: To view a registered model and its model versions in the UI, you need EXECUTE
privilege on the registered model,
plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model
You can view and manage registered models and model versions in Unity Catalog using the Catalog Explorer.
Control access to models
For information about controlling access to models registered in Unity Catalog, see Unity Catalog privileges and securable objects. For best best practices on organizing models across catalogs and schemas, see Organize your data.
You can configure model permissions programatically using the Grants REST API. When configuring model permissions, set securable_type
to "FUNCTION"
in REST API requests. For example, use PATCH /api/2.1/unity-catalog/permissions/function/{full_name}
to update registered model permissions.
Deploy and organize models with aliases and tags
Model aliases and tags help you organize and manage models in Unity Catalog.
Model aliases allow you to assign a mutable, named reference to a particular version of a registered model. You can use aliases to indicate the deployment status of a model version. For example, you could allocate a “Champion” alias to the model version currently in production and target this alias in workloads that use the production model. You can then update the production model by reassigning the “Champion” alias to a different model version.
Tags are key-value pairs that you associate with registered models and model versions, allowing you to label and categorize them by function or status. For example, you could apply a tag with key "task"
and value "question-answering"
(displayed in the UI as task:question-answering
) to registered models intended for question answering tasks. At the model version level, you could tag versions undergoing pre-deployment validation with validation_status:pending
and those cleared for deployment with validation_status:approved
.
See the following sections for how to use aliases and tags.
Set and delete aliases on models
Permissions required: Owner of the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can set, update, and remove aliases for models in Unity Catalog by using Catalog Explorer. You can manage aliases across a registered model in the model details page and configure aliases for a specific model version in the model version details page.
To set, update, and delete aliases using the MLflow Client API, see the examples below:
from mlflow import MlflowClient
client = MlflowClient()
# create "Champion" alias for version 1 of model "prod.ml_team.iris_model"
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 1)
# reassign the "Champion" alias to version 2
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 2)
# get a model version by alias
client.get_model_version_by_alias("prod.ml_team.iris_model", "Champion")
# delete the alias
client.delete_registered_model_alias("prod.ml_team.iris_model", "Champion")
Set and delete tags on models
Permissions required: Owner of or have APPLY_TAG
privilege on the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
See Manage tags in Catalog Explorer on how to set and delete tags using the UI.
To set and delete tags using the MLflow Client API, see the examples below:
from mlflow import MlflowClient
client = MlflowClient()
# Set registered model tag
client.set_registered_model_tag("prod.ml_team.iris_model", "task", "classification")
# Delete registered model tag
client.delete_registered_model_tag("prod.ml_team.iris_model", "task")
# Set model version tag
client.set_model_version_tag("prod.ml_team.iris_model", "1", "validation_status", "approved")
# Delete model version tag
client.delete_model_version_tag("prod.ml_team.iris_model", "1", "validation_status")
Both registered model and model version tags must meet the platform-wide constraints.
For more details on alias and tag client APIs, see the MLflow API documentation.
Load models for inference
Consume model versions by alias in inference workloads
Permissions required: EXECUTE
privilege on the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can write batch inference workloads that reference a model version by alias. For example, the snippet below loads and applies the “Champion” model version for batch inference. If the “Champion” version is updated to reference a new model version, the batch inference workload automatically picks it up on its next execution. This allows you to decouple model deployments from your batch inference workloads.
import mlflow.pyfunc
model_version_uri = "models:/prod.ml_team.iris_model@Champion"
champion_version = mlflow.pyfunc.load_model(model_version_uri)
champion_version.predict(test_x)
Consume model versions by version number in inference workloads
You can also load model versions by version number:
import mlflow.pyfunc
# Load version 1 of the model "prod.ml_team.iris_model"
model_version_uri = "models:/prod.ml_team.iris_model/1"
first_version = mlflow.pyfunc.load_model(model_version_uri)
first_version.predict(test_x)
Annotate a model or model version
Permissions required: Owner of the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can provide information about a model or model version by annotating it. For example, you may want to include an overview of the problem or information about the methodology and algorithm used.
Annotate a model or model version using the API
To update a registered model description, use the MLflow Client API update_registered_model()
method:
client = MlflowClient()
client.update_registered_model(
name="<model-name>",
description="<description>"
)
To update a model version description, use the MLflow Client API update_model_version()
method:
client = MlflowClient()
client.update_model_version(
name="<model-name>",
version=<model-version>,
description="<description>"
)
Rename a model (API only)
Permissions required: Owner of the registered model, CREATE_MODEL
privilege on the schema containing the registered model, and USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
To rename a registered model, use the MLflow Client API rename_registered_model()
method:
client=MlflowClient()
client.rename_registered_model("<model-name>", "<new-model-name>")
Delete a model or model version
Permissions required: Owner of the registered model, plus USE SCHEMA
and USE CATALOG
privileges on the schema and catalog containing the model.
You can delete a registered model or a model version within a registered model using the Catalog Explorer UI or the API.
Delete a model version or model using the API
Warning
You cannot undo this action. When you delete a model, all model artifacts stored by Unity Catalog and all the metadata associated with the registered model are deleted.
List and search models
You can list registered models in Unity Catalog with MLflow’s search_registered_models() Python API:
client=MlflowClient()
client.search_registered_models()
You can also search for a specific model name and list its version details using the search_model_versions()
method:
from pprint import pprint
client=MlflowClient()
[pprint(mv) for mv in client.search_model_versions("name='<model-name>'")]
Example
This example illustrates how to use Models in Unity Catalog to build a machine learning application.
Migrate workflows and models to Unity Catalog
The articles linked below describe how to migrate workflows and models (model training and batch inference jobs) from the Workspace Model Registry to Unity Catalog. Databricks recommends using Models in Unity Catalog for improved governance, easy sharing across workspaces and environments, and more flexible MLOps workflows.
Limitations on Unity Catalog support
Stages are not supported for models in Unity Catalog. Databricks recommends using the three-level namespace in Unity Catalog to express the environment a model is in, and using aliases to promote models for deployment. See the upgrade guide for details.
Webhooks are not supported for models in Unity Catalog. See suggested alternatives in the upgrade guide.
Some search API fields and operators are not supported for models in Unity Catalog. This can be mitigated by calling the search APIs using supported filters and scanning the results. Following are some examples:
The
order_by
parameter is not supported in the search_model_versions or search_registered_models client APIs.Tag-based filters (
tags.mykey = 'myvalue'
) are not supported forsearch_model_versions
orsearch_registered_models
.Operators other than exact equality (for example,
LIKE
,ILIKE
,!=
) are not supported forsearch_model_versions
orsearch_registered_models
.Searching registered models by name (for example,
MlflowClient().search_registered_models(filter_string="name='main.default.mymodel'")
is not supported. To fetch a particular registered model by name, use get_registered_model.
Email notifications and comment discussion threads on registered models and model versions are not supported in Unity Catalog.
The activity log is not supported for models in Unity Catalog. However, you can track activity on models in Unity Catalog using audit logs.