简介:本文面向K8s初学者,系统讲解监控与日志两大可观测性核心模块,涵盖Prometheus、Grafana、EFK等工具的部署与实战,助力构建高效运维体系。
K8s(Kubernetes)作为容器编排领域的标准,其动态调度、弹性伸缩的特性带来了运维复杂度的指数级增长。可观测性(Observability)通过监控(Monitoring)、日志(Logging)、追踪(Tracing)三大支柱,帮助开发者快速定位问题、优化资源分配并保障系统稳定性。对于初学者而言,掌握监控与日志是理解K8s运行机制的关键入口。
K8s默认通过cAdvisor(集成在Kubelet中)采集节点和容器的资源指标(CPU、内存、磁盘、网络),但这些指标仅保留短期数据。需部署Metrics Server实现集群级指标聚合:
# Metrics Server部署示例(需替换镜像版本)apiVersion: apps/v1kind: Deploymentmetadata:name: metrics-servernamespace: kube-systemspec:selector:matchLabels:k8s-app: metrics-servertemplate:metadata:labels:k8s-app: metrics-serverspec:containers:- name: metrics-serverimage: k8s.gcr.io/metrics-server/metrics-server:v0.6.2command:- /metrics-server- --kubelet-insecure-tls- --kubelet-preferred-address-types=InternalIP
关键配置:
--kubelet-insecure-tls:跳过Kubelet证书验证(测试环境使用,生产环境需配置合法证书)。--kubelet-preferred-address-types:优先使用节点InternalIP通信。验证指标采集:
kubectl top nodes # 查看节点资源使用kubectl top pods --all-namespaces # 查看Pod资源使用
Metrics Server仅提供短期指标,需部署Prometheus实现长期存储与高级查询:
# Prometheus部署示例(使用Prometheus Operator简化管理)apiVersion: monitoring.coreos.com/v1kind: Prometheusmetadata:name: prometheusspec:serviceAccountName: prometheusresources:requests:memory: 400Mistorage:volumeClaimTemplate:spec:storageClassName: standardresources:requests:storage: 10GiscrapeConfigs:- job_name: 'kubernetes-nodes'static_configs:- targets:- '10.0.0.1:9100' # Node Exporter地址
核心组件:
Grafana通过预置Dashboard展示Prometheus数据:
# Grafana部署示例apiVersion: apps/v1kind: Deploymentmetadata:name: grafanaspec:template:spec:containers:- name: grafanaimage: grafana/grafana:9.5.6env:- name: GF_SECURITY_ADMIN_USERvalue: "admin"- name: GF_SECURITY_ADMIN_PASSWORDvalue: "password"
推荐Dashboard:
K8s日志分为两类:
通过DaemonSet在每个节点部署日志采集Agent(如Fluent Bit):
# Fluent Bit DaemonSet示例apiVersion: apps/v1kind: DaemonSetmetadata:name: fluent-bitspec:template:spec:containers:- name: fluent-bitimage: fluent/fluent-bit:2.0.11volumeMounts:- name: varlogmountPath: /var/log- name: varlibdockercontainersmountPath: /var/lib/docker/containersreadOnly: truevolumes:- name: varloghostPath:path: /var/log- name: varlibdockercontainershostPath:path: /var/lib/docker/containers
关键配置:
/var/log和Docker容器目录,采集所有容器日志。对于文件日志,可通过Sidecar容器共享卷并实时采集:
# 应用Pod带日志Sidecar示例apiVersion: v1kind: Podmetadata:name: app-with-sidecarspec:containers:- name: appimage: my-app:latestvolumeMounts:- name: app-logsmountPath: /var/log/app- name: log-sidecarimage: fluent/fluent-bit:2.0.11volumeMounts:- name: app-logsmountPath: /var/log/appvolumes:- name: app-logsemptyDir: {}
优化建议:
轻量级替代方案,适合资源受限环境:
LogQL示例:
{namespace="default", container="nginx"} |= "error" | json | line_format "{{.msg}}"
查询default命名空间下nginx容器中包含error的日志,并提取JSON中的msg字段。
K8s的可观测性体系通过监控与日志的有机结合,为动态容器环境提供了全生命周期的运维能力。初学者应从Metrics Server+Prometheus+Grafana入手掌握监控基础,再通过Fluent Bit+Loki/EFK构建日志体系。实际部署时需结合集群规模、资源预算选择合适的工具链,并持续优化采集策略与存储方案。