Application Performance Monitor

Last Updated：2025-11-14

Application overview page

The application overview page displays core data for a single application, including model call, token usage, session count, and user count

Filter criteria: Support model filtering. By default, all application dimension data is displayed; selecting a single model shows model-specific data.
Overview data:

Panel	Description
LLM call count	Display the call count of large model generated by applications during the specified time period
LLM call error count	Display the call error count of large model generated by applications during the specified time period
Average LLM call latency	Display the average large model call latency generated by applications during the specified time period
Token usage	Display the token usage generated by applications during the specified time period
Session count	Display the count of sessions accessing the LLM application by applications during the specified time period
User count	Display the count of users using the large model application during the specified time period
Trace count	Display the count of call chains generated by applications during the specified time period
Span count	Display the count of spans generated by applications during the specified time period

LLM model-related data aggregation:

Panel	Description
Trend Chart of LLM Call Count	Display the trend chart of LLM call count
LLM model call operation type distribution chart	Display the distribution count and proportion of different operation types, including seven types: Embedding, Agent, LLM, Task, Took, Workflow, and Rerank
Top 5 LLM model call counts	Display Top 5 models by LLM call counts, supporting switching to trend chart

Other data aggregation:

Panel	Description
Session Count Trend Chart	Display the trend chart of session counts for applications accessing the LLM application
User Count Trend Chart	Display the trend chart of user counts for applications using the large model application
Trace Count Trend Chart	Display the trend chart of trace count generated by applications

Model call analysis

Model call analysis shows detailed data on LLM large model calls for individual applications

Overview data:

Panel	Description
LLM call count	Display the count of large model calls across applications during the specified time period
LLM call QPS	Display the QPS of large model calls across applications during the specified time period
LLM call error count	Display the count of large model call errors across applications during the specified time period
LLM call error rate	Display the large model call error rate across applications during the specified time period
Average LLM call latency	Display the average latency of large model calls by applications during the specified time period
Average LLM call time-to-first-token latency	Display the average LLM call time-to-first-token latency during the specified time period, where time-to-first-token latency refers to the latency from the time when the user inputs the Query and then presses the Send button to the time when the first Token begins output in the last LLM model request

Large model call trend:

Panel	Description
Trend Chart of LLM Call Count	Display the LLM call count trend chart by default . Support switching LLM call QPS or Avg LLM call per request (average LLM call per user request)
Trend Chart of LLM Call Error Count	Display the LLM model call error count trend chart by default, where you can switch to LLM model call error rate trend chart
Trend Chart of LLM Call Latency	Display the latency trend chart for calling LLM, supporting Avg, p90, p95, and p99 latency
Trend Chart of LLM Call First-token Latency	Display the latency trend chart for LLM to call first token latency, supporting Avg, p90, p95, and p99 latency

Top 5 large model calls:

Panel	Description
Top 5 models with LLM model call counts	Based on all application calls to large models, statistics show the Top 5 large models in terms of the call count, with support for switching trend chart display It can switch to the Top 5 models in terms of LLM model call QPS, with support for switching trend chart It can switch to the Top 5 models in terms of Avg call per LLM request
Top 5 models with most LLM model call errors	Based on application calls to large models, statistics show the Top 5 large models with LLM call errors, with support for switching trend charts. It can switch to the Top 5 models in terms of LLM call error rate, with support for switching trend chart
Top 5 models with highest average LLM call latency	Based on application calls to large models, statistics show the Top 5 large models in terms of the average LLM call latency, with support for switching to other latency metrics such as p90, p95, and p99, and with support for switching trend chart
Top 5 models with highest LLM call time-to-first-token average latency	Based on application calls to large models, statistics show the Top 5 large models in terms of the LLM to call first token average latency, with support for switching to other latency metrics, and with support for switching trend chart

LLM operations

LLM operations show the data related to LLM operations for individual applications in detail. Operation types include seven operation types: Embedding, Agent, LLM, Task, Took, Workflow, and Rerank.

Filter criteria: Support operation type selection (the first one by default)
Overview data: Display relevant data for the selected operation types

Panel	Description
xx operation call count	Display the count of xx operation calls of applications during the specified time period
xx operation call error count	Display the error count of xx operation calls by applications during the specified time period
xx operation call error rate	Display the error rate of xx operation calls by applications during the specified time period
Average latency of xx operation calls	Display the average latency of xx operation calls by applications during the specified time period

Operation type call trend: Display call-related data for the selected operation type

Panel	Description
Trend Chart of xx Operation Call Count	Display the trend chart of xx operation call count
Trend Chart of xx Operation Call Error Count	Display the trend chart of xx operation call error count, supporting switching to trend chart of xx operation call error rate
Trend Chart of xx Operation Call Latency	Display the trend chart of xx operation call latency, supporting Avg, p90, p95, and p99 latency

Top 5 models with most operation types: Display the data of the top 5 models for the selected operation type, supporting switching to trend charts

Panel	Description
Top 5 models with most xx operation call counts	Statistics show the Top 5 models in terms of xx operation call count or the specific operation names, displaying the names of the Top 5 Embedding models, LLM models, Agent, Tool, Task, Rerank models, and Workflows.
Top 5 models with most xx operation call errors	Statistics show the Top 5 models with most xxx errors, supporting switching to the Top 5 models with highest xxx error rates via this dropdown
Top 5 models with highest xx operation call latency	Statistics show the Top 5 models with highest average xxx operation call latency, supporting switching to P90, P95, P99 latency

Token analysis

Token analysis displays token-related data for individual applications, supporting model filtering to view Token-related data for a specific model.

Filter criteria: Model name (default: all), supporting search and single selection.
Overview data:

Panel	Description
Token usage	Display token usage of applications during the specified period, including input and output
Avg Tokens per request	Display the average token usage per user request of applications during the specified period, including input and output
Avg Tokens per LLM call	Display the average token usage per LLM call by applications during the specified time period, with options to view input and output
Time-to-first-token average latency per request	Display the time-to-first-token average latency per user request during a specified time period, where time-to-first-token latency refers to the latency from the time when the user inputs the Query and then presses the Send button to the time when the first Token begins output in the last LLM model request

Trend Chart:

Panel	Description
Token Usage Trend	Display the token usage trend of all applications during the specified time period, with options to view input and output
Trend Chart of Average Tokens Per Request	Display the average token usage per user request of applications during the specified period
Trend of Average Tokens per LLM Call	Display the average token trend per LLM call by applications during the specified time period
Token output speed per request trend	Display the token output speed trend per user request during the specified period, where the speed refers to Token output count per request / request latency
Time-to-first-token latency per request	Display the latency of the first token of applications per user request during a specified time period, including Avg, p90, p95, and p99 latencies

Top 5 users with the most token usage:

Panel	Description
Top 5 sessions with most token usage	Statistics show the Top 5 sessions (Session) with the most token usage, supporting trend chart switching
Top 5 users with most token usage	Statistics show the Top 5 users with the most token usage, supporting trend chart switching
Top 5 models with most token usage	Statistics show the Top 5 large models with the most token usage, supporting trend chart switching
Top 5 models with most average tokens per LLM call	Statistics show the Top 5 large models with "Avg Tokens per LLM call," supporting trend chart switching

Other Tab modules

API monitor, log analysis, exception analysis, and call chain analysis module and APM application performance monitor.

LLM Application Access

Call chain analysis

百度智能云

Cloud Monitor