对接Prometheus监控服务CProm
更新时间:2024-04-16
概述
服务网格可实现微服务无侵入地获得服务间请求的监控指标数据,本文档帮助用户实现服务网格CSM产品对接托管Prometheus监控服务CProm,实现对服务网格中指标的监控告警配置和大盘展示。
前提条件
- 已创建与Kubernetes集群同地域的CProm实例,详情请参考:创建CProm实例。
- 对于已运行工作负载的Kubernetes集群,需要安装CProm采集Agent,用于采集指标,详情请参考:Agent管理。
- 注意:托管服务网格暂不支持监控告警。
操作步骤
开启数据面指标监控
注意:被服务网格实例纳管的cce集群,需要安装CProm采集Agent,否则无法选择相关的Cprom实例
- 方式一:在服务网格实例创建时,开启“监控指标采集”并选择对应的Cprom实例
- 方式二:针对已有服务网格,选择服务网格 > 网格管理,在网格管理页面,单击目标实例名称,然后左侧导航栏选择可观测管理 > Prometheus监控,在监控页面选择开启,并选择对应的Cprom实例
配置数据面指标监控
开启数据面指标监控后,您通过可观测管理 > Prometheus监控页面
- 选择Grafana服务,跳转至Grafana信息页,您可通过Grafana公网域名访问Grafana大盘;
- 选择查看详情,您可查看当前Cprom实例信息;
- 选择配置告警,跳转至Cprom对应页面进行配置,具体指标选择及告警规则配置可参考下文:
数据面监控指标:
范围 | 名称 | 功能 |
---|---|---|
Envoy | IstioEnvoyInternalUpstreamReq503TooHigh | 503内部上游响应的数量高于1%,比例过高。 |
Envoy | IstioEnvoyInternalUpstreamReq200TooLow | 200内部上游响应的数量低于99.9%,比例过低。 |
Envoy | IstioEnvoyUpstreamReq503TooHigh | Envoy 的 HTTP 503 上游响应的百分比过高 |
Envoy | IstioEnvoyUpstreamReq200TooLow | Envoy 的 HTTP 200 上游响应的百分比过低 |
Envoy | IstioEnvoyClusterBindErrors | Envoy Cluster 集群绑定错误 |
Envoy | IstioEnvoyClusterDstHostInvalid | Envoy Cluster 集群目标主机无效 |
数据面告警配置参考:
- alert: IstioEnvoyInternalUpstreamReq503TooHigh
annotations:
summary: 'Envoy Percentage of HTTP 503 internal upstream responses is too high'
description: "The amount of 503 internal upstream responses is higher than 1%. It is too high"
expr: >
rate(envoy_cluster_internal_upstream_rq_503[1m])/rate(envoy_cluster_internal_upstream_rq_completed[1m]) > 0.01
- alert: IstioEnvoyInternalUpstreamReq200TooLow
annotations:
summary: 'Envoy Percentage of HTTP 200 internal upstream responses is too low'
description: "The amount of 200 internal upstream responses is lower than 99.9%. It is too low"
expr: >
rate(envoy_cluster_internal_upstream_rq_200[1m])/rate(envoy_cluster_internal_upstream_rq_completed[1m]) < 0.999
- alert: IstioEnvoyUpstreamReq503TooHigh
annotations:
summary: 'Envoy Percentage of HTTP 503 upstream responses is too high'
description: "The amount of 503 upstream responses is higher than 1%. It is too high"
expr: >
rate(envoy_cluster_upstream_rq_503[1m])/rate(envoy_cluster_upstream_rq_completed[1m]) > 0.01
- alert: IstioEnvoyUpstreamReq200TooLow
annotations:
summary: 'Envoy Percentage of HTTP 200 upstream responses is too low'
description: "The amount of 200 upstream responses is lower than 99.9%. It is too low"
expr: >
rate(envoy_cluster_upstream_rq_200[1m])/rate(envoy_cluster_upstream_rq_completed[1m]) < 0.999
- alert: IstioEnvoyClusterBindErrors
annotations:
summary: "Envoy cluster binding errors"
description: "Error in binding cluster with {{ $labels.pod_name }} pod in {{ $labels.namespace }} namespace."
expr: >
envoy_cluster_bind_errors > 0
- alert: IstioEnvoyClusterDstHostInvalid
annotations:
summary: "Envoy cluster destination host invalid"
description: "Envoy cluster destination host {{ $labels.pod_name }} in {{ $labels.namespace }} namespace invalid for 1 minutes"
expr: > envoy_cluster_original_dst_host_invalid > 0
for: 1m