百度智能云

All Product Document

          Cloud Monitor

          Baidu edge computing (BEC)

          Baidu Edge Computing (BEC) only includes one type of monitor object: instance monitor (Instance). The list of monitoring metrics for instance monitor is as follows:

          Instance Monitor (instance)

          Metric name (English) Metric name (Chinese) Unit Dimension Remarks
          vCPUUsagePercent CPU usage % InstanceId
          CpuIdlePercent CPU idle ratio % InstanceId
          DiskCUsedBytes Used space on Disk C Bytes InstanceId
          DiskCTotalBytes Total space on Disk C Bytes InstanceId
          DiskCFreeBytes Free space on Disk C Bytes InstanceId
          DiskCUsedPercent Disk C space utilization % InstanceId
          DCGM_PIPE_FP16_ACTIVE GPU FP16 pipe active cycle ratio % InstanceId
          DCGM_PROF_PIPE_FP32_ACTIVE GPU FP32 pipe active cycle ratio % InstanceId
          DCGM_PROF_PIPE_FP64_ACTIVE GPU FP64 pipe active cycle ratio % InstanceId
          DCGM_PROF_GR_ENGINE_ACTIVE GPU Graphics or Compute engine active time ratio % InstanceId
          PROF_PCIE_TX_BYTES GPU PCIe bus data transfer rate Bytes InstanceId
          PROF_PCIE_RX_BYTES GPU PCIe bus data receive rate Bytes InstanceId
          DCGM_SM_CLOCK GPU sm clock frequency HZ InstanceId
          DCGM_APP_SM_CLOCK GPU SM application clock frequency HZ InstanceId
          DCGM_PROF_SM_ACTIVE GPU SM active time ratio % InstanceId
          DCGM_PROF_PIPE_TENSOR_ACTIVE GPU Tensor Pipe active cycle ratio % InstanceId
          DCGM_PROF_DRAM_ACTIVE GPU memory bandwidth utilization % InstanceId
          DCGM_APP_MEMORY_CLOCK GPU memory application clock frequency HZ InstanceId
          DCGM_MEMORY_CLOCK GPU memory clock frequency HZ InstanceId
          DCGM_GPU_UTILIZATION GPU utilization % InstanceId
          DCGM_ECC_SBE_AGG_TOTAL Total GPU single-bit persistent ECC errors Item InstanceId
          DCGM_ECC_SBE_VOL_TOTAL Total GPU single-bit volatile ECC errors Item InstanceId
          DCGM_ECC_DBE_AGG_TOTAL Total GPU double-bit persistent ECC errors Item InstanceId
          DCGM_ECC_DBE_VOL_TOTAL Total GPU double-bit volatile ECC errors Item InstanceId
          DCGM_FB_USED GPU frame buffer usage MiB InstanceId
          DCGM_FB_FREE GPU frame buffer remaining MiB InstanceId
          DCGM_POWER_USAGE GPU power W InstanceId
          DCGM_ENC_UTILIZATION GPU encoder utilization % InstanceId
          DCGM_DEC_UTILIZATION GPU decoder utilization % InstanceId
          DCGM_GPU_TEMP GPU operating temperature InstanceId
          DCGM_FAN_SPEED_PERCENT GPU fan speed proportion % InstanceId
          DCGM_PROF_SM_OCCUPANCY GPU thread occupancy ratio on SM % InstanceId
          Gpu0UtilizationGpu GPU utilization % InstanceId
          GpuStatus GPU card overall status InstanceId
          GpuXStatus GPU card status InstanceId
          GpuXEccErrors ECC errors of GPU cards Item InstanceId
          GpuXError GPU card error message InstanceId
          GpuError GPU card error message InstanceId
          DCGM_GPU_PERF GPU performance state - InstanceId
          DCGM_MEM_COPY_UTILIZATION GPU memory copy utilization % InstanceId
          DCGM_MEM_TEMP GPU memory temperature InstanceId
          Gpu0Temperature GPU temperature InstanceId
          DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION Total GPU energy consumption since startup J InstanceId
          HomeUsedPercent HOME disk space utilization % InstanceId
          HomeUsedBytes HOME disk space usage Bytes InstanceId
          DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL Total NVLink bandwidth counters Item InstanceId
          PROF_NVLINK_TX_BYTES NVLink data transfer rate Bytes InstanceId
          DCGM_FI_DEV_NVLINK_RECOVERY_ERROR_COUNT_TOTAL Total NVLink recovery errors Item InstanceId
          PROF_NVLINK_RX_BYTES Nvlink data receive rate Bytes InstanceId
          DCGM_FI_DEV_NVLINK_CRC_DATA_ERROR_COUNT_TOTAL Total number of NVLink data CRC errors. Item InstanceId
          DCGM_FI_DEV_NVLINK_CRC_FLIT_ERROR_COUNT_TOTAL Total NVLink flow control CRC errors Item InstanceId
          DCGM_FI_DEV_NVLINK_REPLAY_ERROR_COUNT_TOTAL Total NVLink retries Item InstanceId
          TcpLossSegs TCP packet drops Item InstanceId
          TcpOutSegs TCP packets sent Item InstanceId
          TcpInSegs TCP packets received Item InstanceId
          TcpRetranSegs TCP retransmission count Time InstanceId
          SwapUsedBytes Swap partition usage Bytes InstanceId
          SwapTotalBytes Total swap partition Bytes InstanceId
          SwapFreeBytes Idle swap partition Bytes InstanceId
          MemUsedPercent Memory usage % InstanceId
          Gpu0UtilizationMemory Memory usage % InstanceId
          MemUsedBytes Memory usage Bytes InstanceId
          Gpu0MemoryUsed Memory usage Bytes InstanceId
          MemAvailableBytes Available memory usage Bytes InstanceId
          MemTotalBytes Total memory Bytes InstanceId
          Gpu0MemoryTotal Total memory Bytes InstanceId
          MemFreeBytes Free memory Bytes InstanceId
          Gpu0MemoryFree Free memory Bytes InstanceId
          Cpu0ProcessorPercent Single-core CPU utilization % InstanceId
          Cpu0IdlePercent Single-core CPU idle ratio % InstanceId
          MemBufferBytes Block device I/O memory buffer usage Bytes InstanceId
          TcpCurrentEstab Established TCP connections Item InstanceId
          GpuAllEccErrors ECC errors of all GPU cards InstanceId
          GpuAvgGpuUtilizationForall Average GPU utilization across all GPUs % InstanceId
          GpuAvgMemoryUtilizationForall Average memory utilization of all GPUs % InstanceId
          GpuMaxGpuUtilization Maximum GPU utilization across all GPUs % InstanceId
          GpuMaxMemoryUtilization Maximum memory utilization across all GPU cards % InstanceId
          GpuMaxTemperature Maximum temperature across all GPU cards InstanceId
          DiskUsedBytes Total server disk utilization Bytes InstanceId
          DiskUsedPercent Server disk utilization % InstanceId
          DiskFreeBytes Total free disk space on server Bytes InstanceId
          DiskTotalBytes Total disk space on server Bytes InstanceId
          MemCacheBytes File system memory cache value Bytes InstanceId
          GpuMaxGpuUtilizationIndex GPU ID with maximum GPU utilization InstanceId
          GpuMaxMemoryUtilizationIndex GPU ID with Maximum memory utilization InstanceId
          CpuLoadAvg15 Server CPU load within the last 15 minutes % InstanceId
          CpuLoadAvg1 Server CPU load within the last 1 minute % InstanceId
          CpuLoadAvg5 Server CPU load within the last 5 minutes % InstanceId
          GpuMaxEccErrorsIndex GPU ID with maximum ECC errors InstanceId
          GpuMaxTemperatureIndex GPU ID with maximum temperature InstanceId
          DiskUsedInodes Total used inodes on server Item InstanceId
          DiskInodesUsedPercent Total utilization of inodes on server % InstanceId
          DiskTotalInodes Total inodes on server Item InstanceId
          DiskFreeInodes Total free inodes on server Item InstanceId
          RootUsedPercent Root disk space utilization % InstanceId
          RootUsedBytes Root disk space usage Bytes InstanceId
          CpuInterruptSecond CPU Interrupts per second Time/second InstanceId
          CpuContextSwitchSecond Context switches per second Time/second InstanceId
          vDiskWriteOpCountPerSecond Disk IO write operations per second Time/second InstanceId
          vDiskWriteBytesPerSecond Disk IO write throughput per second Bytes/s InstanceId
          vDiskReadOpCountPerSecond Disk IO read operations per second Time/second InstanceId
          vDiskReadBytesPerSecond Disk IO read throughput per second Bytes/s InstanceId
          CpuUserPercent User CPU time ratio % InstanceId
          CpuWaitPercent CPU IO-wait time ratio % InstanceId
          CpuSystemPercent System CPU time ratio % InstanceId
          VNicInPPS Network interface card inbound packet rate pps InstanceId
          VNicInBPS Network interface card inbound bandwidth bps InstanceId
          vNicInBytes Network interface card ingress traffic Bytes InstanceId
          VNicOutPPS Network interface card transmit packet rate pps InstanceId
          VNicOutBPS Network interface card outbound bandwidth bps InstanceId
          vNicOutBytes Network interface card egress traffic Bytes InstanceId
          Previous
          Network
          Next
          Baidu object storage (BOS)