MLflow Tracing for agents
Important
This feature is in Public Preview.
This article describes MLflow Tracing and the scenarios where it is helpful for evaluating generative AI applications in your AI system.
In software development, tracing involves recording sequences of events like user sessions or request flows. In the context of AI systems, tracing often refers to interactions you have with an AI system. An example trace of an AI system might look like instrumenting the inputs and parameters for a RAG application that includes a user message with prompt, a vector lookup, and an interface with the generative AI model.
What is MLflow Tracing?
Using MLflow Tracing you can log, analyze, and compare traces across different versions of generative AI applications. It allows you to debug your generative AI Python code and keep track of inputs and responses. Doing so can help you discover conditions or parameters that contribute to poor performance of your application. MLflow Tracing is tightly integrated with Databricks tools and infrastructure, allowing you to store and display all your traces in Databricks notebooks or the MLflow experiment UI as you run your code.
When you develop AI systems on Databricks using libraries such as LangChain, LlamaIndex, OpenAI, or custom PyFunc, MLflow Tracing allows you to see all the events and intermediate outputs from each step of your agent. You can easily see the prompts, which models and retrievers were used, which documents were retrieved to augment the response, how long things took, and the final output. For example, if your model hallucinates, you can quickly inspect each step that led to the hallucination.
Why use MLflow Tracing?
MLflow Tracing provides several benefits to help you track your development workflow. For example, you can:
Interactive trace visualization and investigation tool for diagnosing issues in development.
Verify that prompt templates and guardrails are producing reasonable results.
Explore and minimize the latency impact of different frameworks, models, chunk sizes, and software development practices.
Measure application costs by tracking token usage by different models.
Establish benchmark (“golden”) datasets to evaluate the performance of different versions.
Install MLflow Tracing
MLflow Tracing is available in MLflow versions 2.13.0 and above.
%pip install mlflow>=2.13.0 -qqqU
%restart_python
Alternatively, you can %pip install databricks-agents
to install the latest version of databricks-agents
that includes a compatible MLflow version.
Use MLflow Tracing in development
MLflow Tracing helps you analyze performance issues and accelerate the agent development cycle. The following sections assume you are conducting agent development and MLflow Tracing from a notebook.
Note
In notebook environment, MLflow Tracing might add up to a few seconds of overhead to the agent run time.
Note
As of Databricks Runtime 15.4 LTS ML, MLflow tracing is enabled by default within notebooks. To disable tracing, for example with LangChain, you can execute mlflow.langchain.autolog(log_traces=False)
in your notebook.
Add traces to your agent
MLflow Tracing provides three different ways to use traces with your generative AI application with traces. See Add traces to your agents for examples of using these methods. For API reference details, see the MLflow documentation.
API |
Recommended Use Case |
Description |
---|---|---|
MLflow autologging |
Development on integrated GenAI libraries |
Autologging automatically instruments traces for popular open source frameworks such as LangChain, LlamaIndex, and OpenAI. When you add |
Fluent APIs |
Custom agent with Pyfunc |
Low-code APIs for instrumenting AI systems without worrying about the tree structure of the trace. MLflow determines the appropriate parent-child tree structure (spans) based on the Python stack. |
MLflow Client APIs |
Advanced use cases such as multi-threading |
Recommended for use cases that require more control, such as multi-threaded applications or callback-based instrumentation. |
Reviewing traces
After you run the instrumented agent, you can review the generated traces in different ways:
The trace visualization is rendered inline in the cell output.
The traces are logged to your MLflow experiment. You can review the full list of historical traces and search on them in the
Traces
tab in the Experiment page. When the agent runs under an active MLflow Run, you can also find the traces in the Run page.Programmatically retrieve traces using search_traces() API.
Limitations
MLflow Tracing is available in Databricks notebooks and notebook jobs.
LangChain autologging may not support all LangChain prediction APIs. Please refer to the MLflow documentation for the full list of supported APIs.