百度智能云

All Product Document

          Cloud Monitor

          Overview

          Overview

          LLM (Large Language Model) application performance monitoring tracks essential metrics like inference latency, throughput, and token usage in real-time. It supports LLM-specific span collection and provides visualized, end-to-end call chain details for precise optimization and efficient operations.

          Activation and billing

          Application Performance Monitor (LLM) is a paid product. You need to activate it in the LLM module before use. The product is currently in public beta. We will send notification via email, SMS and in-site messages before formal billing begins.

          Product advantages

          *Easy to use: After activation, applications can be quickly onboarded following the access process, gaining out-of-the-box LLM application observability capabilities.

          • Embrace open source: Support the industry-standard OpenTelemetry and various LLM frameworks and components.
          • Metric visualization: Provide model call analysis and token analysis capabilities, statistically analyze key metrics of LLM application calls at a global level to make technology-to-service enablement outcomes "visible".
          • Process transparency: Customize domain-specific trace semantics for LLM applications, reveal internal operations from user-level I/O to parameter details of each span, thus enabling "white-box" visibility for each call.

          Core Capabilities

          • LLM application performance monitor: Track user requests across applications, aggregate real-time monitor data, and support specialized monitoring capabilities including LLM call analysis, token analysis, and LLM operation analysis.
          • Real-time anomaly alarm: Set alarm trigger rules for key performance metrics of LLM and notify through multiple channels (phone, SMS, email, WeChat, DingTalk, Feishu, etc.). In case of monitoring metric anomalies, users receive alarm notifications immediately, enabling timely fault resolution to prevent losses due to delayed detection. (Coming soon. Stay tuned.)
          • Distributed topology dynamic discovery: Automatically detect application logic topology, visually displays upstream/downstream dependencies, and support real-time data drilling for comprehensive analysis of application performance metrics.
          • Called chain analysis: Analyze the called chain information of LLM. On the Called Chain Analysis page, you can view the time consumption of different spans in the large model trace, as well as the associated information of spans, such as input, output, token consumption, and more.
          Previous
          APM Application Performance Monitor
          Next
          Global overview