百度智能云

All Product Document

          Cloud Monitor

          Application Performance Monitor

          Application overview page

          The application overview page displays core data for a single application, including model call, token usage, session count, and user count

          image.png

          • Filter criteria: Support model filtering. By default, all application dimension data is displayed; selecting a single model shows model-specific data.
          • Overview data:
          Panel Description
          LLM call count Display the call count of large model generated by applications during the specified time period
          LLM call error count Display the call error count of large model generated by applications during the specified time period
          Average LLM call latency Display the average large model call latency generated by applications during the specified time period
          Token usage Display the token usage generated by applications during the specified time period
          Session count Display the count of sessions accessing the LLM application by applications during the specified time period
          User count Display the count of users using the large model application during the specified time period
          Trace count Display the count of call chains generated by applications during the specified time period
          Span count Display the count of spans generated by applications during the specified time period
          • LLM model-related data aggregation:
          Panel Description
          Trend Chart of LLM Call Count Display the trend chart of LLM call count
          LLM model call operation type distribution chart Display the distribution count and proportion of different operation types, including seven types: Embedding, Agent, LLM, Task, Took, Workflow, and Rerank
          Top 5 LLM model call counts Display Top 5 models by LLM call counts, supporting switching to trend chart
          • Other data aggregation:
          Panel Description
          Session Count Trend Chart Display the trend chart of session counts for applications accessing the LLM application
          User Count Trend Chart Display the trend chart of user counts for applications using the large model application
          Trace Count Trend Chart Display the trend chart of trace count generated by applications

          Model call analysis

          Model call analysis shows detailed data on LLM large model calls for individual applications

          image.png

          • Overview data:
          Panel Description
          LLM call count Display the count of large model calls across applications during the specified time period
          LLM call QPS Display the QPS of large model calls across applications during the specified time period
          LLM call error count Display the count of large model call errors across applications during the specified time period
          LLM call error rate Display the large model call error rate across applications during the specified time period
          Average LLM call latency Display the average latency of large model calls by applications during the specified time period
          Average LLM call time-to-first-token latency Display the average LLM call time-to-first-token latency during the specified time period, where time-to-first-token latency refers to the latency from the time when the user inputs the Query and then presses the Send button to the time when the first Token begins output in the last LLM model request
          • Large model call trend:
          Panel Description
          Trend Chart of LLM Call Count Display the LLM call count trend chart by default
          . Support switching LLM call QPS
          or Avg LLM call per request (average LLM call per user request)
          Trend Chart of LLM Call Error Count Display the LLM model call error count trend chart by default, where you can switch to LLM model call error rate trend chart
          Trend Chart of LLM Call Latency Display the latency trend chart for calling LLM, supporting Avg, p90, p95, and p99 latency
          Trend Chart of LLM Call First-token Latency Display the latency trend chart for LLM to call first token latency, supporting Avg, p90, p95, and p99 latency
          • Top 5 large model calls:
          Panel Description
          Top 5 models with LLM model call counts Based on all application calls to large models, statistics show the Top 5 large models in terms of the call count, with support for switching trend chart display
          It can switch to the Top 5 models in terms of LLM model call QPS, with support for switching trend chart
          It can switch to the Top 5 models in terms of Avg call per LLM request
          Top 5 models with most LLM model call errors Based on application calls to large models, statistics show the Top 5 large models with LLM call errors, with support for switching trend charts. It can switch to the Top 5 models in terms of LLM call error rate, with support for switching trend chart
          Top 5 models with highest average LLM call latency Based on application calls to large models, statistics show the Top 5 large models in terms of the average LLM call latency, with support for switching to other latency metrics such as p90, p95, and p99, and with support for switching trend chart
          Top 5 models with highest LLM call time-to-first-token average latency Based on application calls to large models, statistics show the Top 5 large models in terms of the LLM to call first token average latency, with support for switching to other latency metrics, and with support for switching trend chart

          LLM operations

          LLM operations show the data related to LLM operations for individual applications in detail. Operation types include seven operation types: Embedding, Agent, LLM, Task, Took, Workflow, and Rerank.

          image.png

          • Filter criteria: Support operation type selection (the first one by default)
          • Overview data: Display relevant data for the selected operation types
          Panel Description
          xx operation call count Display the count of xx operation calls of applications during the specified time period
          xx operation call error count Display the error count of xx operation calls by applications during the specified time period
          xx operation call error rate Display the error rate of xx operation calls by applications during the specified time period
          Average latency of xx operation calls Display the average latency of xx operation calls by applications during the specified time period
          • Operation type call trend: Display call-related data for the selected operation type
          Panel Description
          Trend Chart of xx Operation Call Count Display the trend chart of xx operation call count
          Trend Chart of xx Operation Call Error Count Display the trend chart of xx operation call error count, supporting switching to trend chart of xx operation call error rate
          Trend Chart of xx Operation Call Latency Display the trend chart of xx operation call latency, supporting Avg, p90, p95, and p99 latency
          • Top 5 models with most operation types: Display the data of the top 5 models for the selected operation type, supporting switching to trend charts
          Panel Description
          Top 5 models with most xx operation call counts Statistics show the Top 5 models in terms of xx operation call count or the specific operation names, displaying the names of the Top 5 Embedding models, LLM models, Agent, Tool, Task, Rerank models, and Workflows.
          Top 5 models with most xx operation call errors Statistics show the Top 5 models with most xxx errors, supporting switching to the Top 5 models with highest xxx error rates via this dropdown
          Top 5 models with highest xx operation call latency Statistics show the Top 5 models with highest average xxx operation call latency, supporting switching to P90, P95, P99 latency

          Token analysis

          Token analysis displays token-related data for individual applications, supporting model filtering to view Token-related data for a specific model.

          image.png

          • Filter criteria: Model name (default: all), supporting search and single selection.
          • Overview data:
          Panel Description
          Token usage Display token usage of applications during the specified period, including input and output
          Avg Tokens per request Display the average token usage per user request of applications during the specified period, including input and output
          Avg Tokens per LLM call Display the average token usage per LLM call by applications during the specified time period, with options to view input and output
          Time-to-first-token average latency per request Display the time-to-first-token average latency per user request during a specified time period, where time-to-first-token latency refers to the latency from the time when the user inputs the Query and then presses the Send button to the time when the first Token begins output in the last LLM model request
          • Trend Chart:
          Panel Description
          Token Usage Trend Display the token usage trend of all applications during the specified time period, with options to view input and output
          Trend Chart of Average Tokens Per Request Display the average token usage per user request of applications during the specified period
          Trend of Average Tokens per LLM Call Display the average token trend per LLM call by applications during the specified time period
          Token output speed per request trend Display the token output speed trend per user request during the specified period, where the speed refers to Token output count per request / request latency
          Time-to-first-token latency per request Display the latency of the first token of applications per user request during a specified time period, including Avg, p90, p95, and p99 latencies
          • Top 5 users with the most token usage:
          Panel Description
          Top 5 sessions with most token usage Statistics show the Top 5 sessions (Session) with the most token usage, supporting trend chart switching
          Top 5 users with most token usage Statistics show the Top 5 users with the most token usage, supporting trend chart switching
          Top 5 models with most token usage Statistics show the Top 5 large models with the most token usage, supporting trend chart switching
          Top 5 models with most average tokens per LLM call Statistics show the Top 5 large models with "Avg Tokens per LLM call," supporting trend chart switching

          Other Tab modules

          API monitor, log analysis, exception analysis, and call chain analysis module and APM application performance monitor.

          Previous
          LLM Application Access
          Next
          Call chain analysis