Manage model lifecycle in Unity Catalog

Important

This article documents Models in Unity Catalog, which Databricks recommends for governing and deploying models. If your workspace is not enabled for Unity Catalog, the functionality on this page is not available. Instead, see Manage model lifecycle using the Workspace Model Registry. For guidance on how to upgrade from the Workspace Model Registry to Unity Catalog, see Migrate workflows and models to Unity Catalog.

This article describes how to use Models in Unity Catalog as part of your machine learning workflow to manage the full lifecycle of ML models. Databricks provides a hosted version of MLflow Model Registry in Unity Catalog. Models in Unity Catalog extends the benefits of Unity Catalog to ML models, including centralized access control, auditing, lineage, and model discovery across workspaces. Models in Unity Catalog is compatible with the open-source MLflow Python client.

Key features of Models in Unity Catalog include:

  • Namespacing and governance for models, so you can group and govern models at the environment, project, or team level (“Grant data scientists read-only access to production models”).

  • Chronological model lineage (which MLflow experiment and run produced the model at a given time).

  • Model versioning.

  • Model deployment via aliases. For example, mark the “Champion” version of a model within your prod catalog.

This article includes instructions for both the Models in Unity Catalog UI and API.

For an overview of Model Registry concepts, see ML lifecycle management using MLflow.

Requirements

  1. Unity Catalog must be enabled in your workspace. See Get started using Unity Catalog to create a Unity Catalog Metastore, enable it in a workspace, and create a catalog. If Unity Catalog is not enabled, you can still use the classic workspace model registry.

  2. You must have access to run commands on a cluster with access to Unity Catalog.

  3. To create new registered models, you need the CREATE_MODEL privilege on a schema, in addition to the USE SCHEMA and USE CATALOG privileges on the schema and its enclosing catalog. CREATE_MODEL is a new schema-level privilege that you can grant using the Catalog Explorer UI or the SQL GRANT command, as shown below.

    GRANT CREATE_MODEL ON SCHEMA <schema-name> TO <principal>
    

Upgrade training workloads to Unity Catalog

This section includes instructions to upgrade existing training workloads to Unity Catalog.

Install MLflow Python client

Support for models in Unity Catalog is included in Databricks Runtime 13.2 ML and above.

You can also use models in Unity Catalog on Databricks Runtime 11.3 LTS and above by installing the latest version of the MLflow Python client in your notebook, using the code below.

%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()

Configure MLflow client to access models in Unity Catalog

By default, the MLflow Python client creates models in the Databricks workspace model registry. To upgrade to models in Unity Catalog, configure the MLflow client:

import mlflow
mlflow.set_registry_uri("databricks-uc")

Train and register Unity Catalog-compatible models

Permissions required: To create a new registered model, you need the CREATE_MODEL and USE SCHEMA privileges on the enclosing schema, and USE CATALOG privilege on the enclosing catalog. To create new model versions under a registered model, you must be the owner of the registered model and have USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

ML model versions in UC must have a model signature. If you’re not already logging MLflow models with signatures in your model training workloads, you can either:

  • Use Databricks autologging, which automatically logs models with signatures for many popular ML frameworks. See supported frameworks in the MLflow docs.

  • With MLflow 2.5.0 and above, you can specify an input example in your mlflow.<flavor>.log_model call, and the model signature is automatically inferred. For further information, refer to the MLflow documentation.

Then, pass the three-level name of the model to MLflow APIs, in the form <catalog>.<schema>.<model>.

The examples in this section create and access models in the ml_team schema under the prod catalog.

The model training examples in this section create a new model version and register it in the prod catalog. Using the prod catalog doesn’t necessarily mean that the model version serves production traffic. The model version’s enclosing catalog, schema, and registered model reflect its environment (prod) and associated governance rules (for example, privileges can be set up so that only admins can delete from the prod catalog), but not its deployment status. To manage the deployment status, use model aliases.

Register a model to Unity Catalog using autologging

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

# Train a sklearn model on the iris dataset
X, y = datasets.load_iris(return_X_y=True, as_frame=True)
clf = RandomForestClassifier(max_depth=7)
clf.fit(X, y)

# Note that the UC model name follows the pattern
# <catalog_name>.<schema_name>.<model_name>, corresponding to
# the catalog, schema, and registered model name
# in Unity Catalog under which to create the version
# The registered model will be created if it doesn't already exist
autolog_run = mlflow.last_active_run()
model_uri = "runs:/{}/model".format(autolog_run.info.run_id)
mlflow.register_model(model_uri, "prod.ml_team.iris_model")

Register a model to Unity Catalog with automatically inferred signature

Support for automatically inferred signatures is available in MLflow version 2.5.0 and above, and is supported in Databricks Runtime 11.3 LTS ML and above. To use automatically inferred signatures, use the following code to install the latest MLflow Python client in your notebook:

%pip install --upgrade "mlflow-skinny[databricks]"
dbutils.library.restartPython()

The following code shows an example of an automatically inferred signature.

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier

with mlflow.start_run():
    # Train a sklearn model on the iris dataset
    X, y = datasets.load_iris(return_X_y=True, as_frame=True)
    clf = RandomForestClassifier(max_depth=7)
    clf.fit(X, y)
    # Take the first row of the training dataset as the model input example.
    input_example = X.iloc[[0]]
    # Log the model and register it as a new version in UC.
    mlflow.sklearn.log_model(
        sk_model=clf,
        artifact_path="model",
        # The signature is automatically inferred from the input example and its predicted output.
        input_example=input_example,
        registered_model_name="prod.ml_team.iris_model",
    )

Track the data lineage of a model in Unity Catalog

Note

Support for table to model lineage in Unity Catalog is available in MLflow 2.11.0 and above.

When you train a model on a table in Unity Catalog, you can track the lineage of the model to the upstream dataset(s) it was trained and evaluated on. To do this, use mlflow.log_input. This saves the input table information with the MLflow run that generated the model. Data lineage is also automatically captured for models logged using feature store APIs. See View feature store lineage.

When you register the model to Unity Catalog, lineage information is automatically saved and is visible in the Lineage tab of the model version UI in Catalog Explorer.

The following code shows an example.

import mlflow
import pandas as pd
import pyspark.pandas as ps
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestRegressor

# Write a table to Unity Catalog
iris = load_iris()
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df.rename(
  columns = {
    'sepal length (cm)':'sepal_length',
    'sepal width (cm)':'sepal_width',
    'petal length (cm)':'petal_length',
    'petal width (cm)':'petal_width'},
  inplace = True
)
iris_df['species'] = iris.target
ps.from_pandas(iris_df).to_table("prod.ml_team.iris", mode="overwrite")

# Load a Unity Catalog table, train a model, and log the input table
dataset = mlflow.data.load_delta(table_name="prod.ml_team.iris", version="0")
pd_df = dataset.df.toPandas()
X = pd_df.drop("species", axis=1)
y = pd_df["species"]
with mlflow.start_run():
    clf = RandomForestRegressor(n_estimators=100)
    clf.fit(X, y)
    mlflow.log_input(dataset, "training")

View models in the UI

Permissions required: To view a registered model and its model versions in the UI, you need EXECUTE privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model

You can view and manage registered models and model versions in Unity Catalog using the Catalog Explorer.

Control access to models

For information about controlling access to models registered in Unity Catalog, see Unity Catalog privileges and securable objects. For best best practices on organizing models across catalogs and schemas, see Organize your data.

You can configure model permissions programatically using the Grants REST API. When configuring model permissions, set securable_type to "FUNCTION" in REST API requests. For example, use PATCH /api/2.1/unity-catalog/permissions/function/{full_name} to update registered model permissions.

Deploy and organize models with aliases and tags

Model aliases and tags help you organize and manage models in Unity Catalog.

Model aliases allow you to assign a mutable, named reference to a particular version of a registered model. You can use aliases to indicate the deployment status of a model version. For example, you could allocate a “Champion” alias to the model version currently in production and target this alias in workloads that use the production model. You can then update the production model by reassigning the “Champion” alias to a different model version.

Tags are key-value pairs that you associate with registered models and model versions, allowing you to label and categorize them by function or status. For example, you could apply a tag with key "task" and value "question-answering" (displayed in the UI as task:question-answering) to registered models intended for question answering tasks. At the model version level, you could tag versions undergoing pre-deployment validation with validation_status:pending and those cleared for deployment with validation_status:approved.

See the following sections for how to use aliases and tags.

Set and delete aliases on models

Permissions required: Owner of the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can set, update, and remove aliases for models in Unity Catalog by using Catalog Explorer. You can manage aliases across a registered model in the model details page and configure aliases for a specific model version in the model version details page.

To set, update, and delete aliases using the MLflow Client API, see the examples below:

from mlflow import MlflowClient
client = MlflowClient()

# create "Champion" alias for version 1 of model "prod.ml_team.iris_model"
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 1)

# reassign the "Champion" alias to version 2
client.set_registered_model_alias("prod.ml_team.iris_model", "Champion", 2)

# get a model version by alias
client.get_model_version_by_alias("prod.ml_team.iris_model", "Champion")

# delete the alias
client.delete_registered_model_alias("prod.ml_team.iris_model", "Champion")

Set and delete tags on models

Permissions required: Owner of or have APPLY_TAG privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

See Manage tags in Catalog Explorer on how to set and delete tags using the UI.

To set and delete tags using the MLflow Client API, see the examples below:

from mlflow import MlflowClient
client = MlflowClient()

# Set registered model tag
client.set_registered_model_tag("prod.ml_team.iris_model", "task", "classification")

# Delete registered model tag
client.delete_registered_model_tag("prod.ml_team.iris_model", "task")

# Set model version tag
client.set_model_version_tag("prod.ml_team.iris_model", "1", "validation_status", "approved")

# Delete model version tag
client.delete_model_version_tag("prod.ml_team.iris_model", "1", "validation_status")

Both registered model and model version tags must meet the platform-wide constraints.

For more details on alias and tag client APIs, see the MLflow API documentation.

Load models for inference

Consume model versions by alias in inference workloads

Permissions required: EXECUTE privilege on the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can write batch inference workloads that reference a model version by alias. For example, the snippet below loads and applies the “Champion” model version for batch inference. If the “Champion” version is updated to reference a new model version, the batch inference workload automatically picks it up on its next execution. This allows you to decouple model deployments from your batch inference workloads.

import mlflow.pyfunc
model_version_uri = "models:/prod.ml_team.iris_model@Champion"
champion_version = mlflow.pyfunc.load_model(model_version_uri)
champion_version.predict(test_x)

Consume model versions by version number in inference workloads

You can also load model versions by version number:

import mlflow.pyfunc
# Load version 1 of the model "prod.ml_team.iris_model"
model_version_uri = "models:/prod.ml_team.iris_model/1"
first_version = mlflow.pyfunc.load_model(model_version_uri)
first_version.predict(test_x)

Share models across workspaces

As long as you have the appropriate privileges, you can access models in Unity Catalog from any workspace. For example, you can access models from the prod catalog in a dev workspace, to facilitate comparing newly-developed models to the production baseline.

To collaborate with other users (share write privileges) on a registered model you created, you must grant ownership of the model to a group containing yourself and the users you’d like to collaborate with. Collaborators must also have the USE CATALOG and USE SCHEMA privileges on the catalog and schema containing the model. See Unity Catalog privileges and securable objects for details.

Annotate a model or model version

Permissions required: Owner of the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can provide information about a model or model version by annotating it. For example, you may want to include an overview of the problem or information about the methodology and algorithm used.

Annotate a model or model version using the UI

See Document data in Catalog Explorer using markdown comments.

Annotate a model or model version using the API

To update a registered model description, use the MLflow Client API update_registered_model() method:

client = MlflowClient()
client.update_registered_model(
  name="<model-name>",
  description="<description>"
)

To update a model version description, use the MLflow Client API update_model_version() method:

client = MlflowClient()
client.update_model_version(
  name="<model-name>",
  version=<model-version>,
  description="<description>"
)

Rename a model (API only)

Permissions required: Owner of the registered model, CREATE_MODEL privilege on the schema containing the registered model, and USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

To rename a registered model, use the MLflow Client API rename_registered_model() method:

client=MlflowClient()
client.rename_registered_model("<model-name>", "<new-model-name>")

Delete a model or model version

Permissions required: Owner of the registered model, plus USE SCHEMA and USE CATALOG privileges on the schema and catalog containing the model.

You can delete a registered model or a model version within a registered model using the Catalog Explorer UI or the API.

Delete a model version or model using the API

Warning

You cannot undo this action. When you delete a model, all model artifacts stored by Unity Catalog and all the metadata associated with the registered model are deleted.

Delete a model version

To delete a model version, use the MLflow Client API delete_model_version() method:

# Delete versions 1,2, and 3 of the model
client = MlflowClient()
versions=[1, 2, 3]
for version in versions:
  client.delete_model_version(name="<model-name>", version=version)

Delete a model

To delete a model, use the MLflow Client API delete_registered_model() method:

client = MlflowClient()
client.delete_registered_model(name="<model-name>")

List and search models

You can list registered models in Unity Catalog with MLflow’s search_registered_models() Python API:

client=MlflowClient()
client.search_registered_models()

You can also search for a specific model name and list its version details using the search_model_versions() method:

from pprint import pprint

client=MlflowClient()
[pprint(mv) for mv in client.search_model_versions("name='<model-name>'")]

Example

This example illustrates how to use Models in Unity Catalog to build a machine learning application.

Models in Unity Catalog example

Migrate workflows and models to Unity Catalog

Databricks recommends using Models in Unity Catalog for improved governance, easy sharing across workspaces and environments, and more flexible MLOps workflows. The table compares the capabilities of the Workspace Model Registry and Unity Catalog.

Capability

Workspace Model Registry (legacy)

Models in Unity Catalog (recommended)

Reference model versions by named aliases

Model Registry Stages: Move model versions into one of four fixed stages to reference them by that stage. Cannot rename or add stages.

Model Registry Aliases: Create up to 10 custom and reassignable named references to model versions for each registered model.

Create access-controlled environments for models

Model Registry Stages: Use stages within one registered model to denote the environment of its model versions, with access controls for only two of the four fixed stages (Staging and Production).

Registered Models: Create a registered model for each environment in your MLOps workflow, utilizing three-level namespaces and permissions of Unity Catalog to express governance.

Promote models across environments (deploy model)

Use the transition_model_version_stage() MLflow Client API to move a model version to a different stage, potentially breaking workflows that reference the previous stage.

Use the copy_model_version() MLflow Client API to copy a model version from one registered model to another.

Access and share models across workspaces

Manually export and import models across workspaces, or configure connections to remote model registries using personal access tokens and workspace secret scopes.

Out of the box access to models across workspaces in the same account. No configuration required.

Configure permissions

Set permissions at the workspace-level.

Set permissions at the account-level, which applies consistent governance across workspaces.

Access models in the Databricks markplace

Unavailable.

Load models from the Databricks marketplace into your Unity Catalog metastore and access them across workspaces.

The articles linked below describe how to migrate workflows (model training and batch inference jobs) and models from the Workspace Model Registry to Unity Catalog.

Limitations on Unity Catalog support

  • Stages are not supported for models in Unity Catalog. Databricks recommends using the three-level namespace in Unity Catalog to express the environment a model is in, and using aliases to promote models for deployment. See the upgrade guide for details.

  • Webhooks are not supported for models in Unity Catalog. See suggested alternatives in the upgrade guide.

  • Some search API fields and operators are not supported for models in Unity Catalog. This can be mitigated by calling the search APIs using supported filters and scanning the results. Following are some examples:

    • The order_by parameter is not supported in the search_model_versions or search_registered_models client APIs.

    • Tag-based filters (tags.mykey = 'myvalue') are not supported for search_model_versions or search_registered_models.

    • Operators other than exact equality (for example, LIKE, ILIKE, !=) are not supported for search_model_versions or search_registered_models.

    • Searching registered models by name (for example, MlflowClient().search_registered_models(filter_string="name='main.default.mymodel'") is not supported. To fetch a particular registered model by name, use get_registered_model.

  • Email notifications and comment discussion threads on registered models and model versions are not supported in Unity Catalog.

  • The activity log is not supported for models in Unity Catalog. However, you can track activity on models in Unity Catalog using audit logs.

  • search_registered_models might return stale results for models shared through Delta Sharing. To ensure the most recent results, use the Databricks CLI or SDK to list the models in a schema.