简介:本文详细介绍了如何通过Prometheus监控SpringBoot程序运行状态,并实现实时告警通知。从依赖集成、指标暴露到Prometheus配置、告警规则定义及通知渠道整合,每一步均提供具体操作指南,助力开发者高效构建监控体系。
在微服务架构盛行的当下,SpringBoot凭借其”约定优于配置”的特性成为Java生态的主流框架。然而,随着服务实例数量的指数级增长,传统的人工巡检方式已无法满足实时性要求。Prometheus作为CNCF(云原生计算基金会)的毕业项目,凭借其强大的多维度数据模型、灵活的查询语言PromQL和高效的时序数据库,成为监控SpringBoot应用的首选方案。
本文将系统阐述如何通过Prometheus实现SpringBoot程序的全方位监控,涵盖指标采集、数据可视化、异常检测和告警通知的完整闭环。
| 组件 | 推荐版本 | 兼容性说明 |
|---|---|---|
| SpringBoot | 2.7.x/3.0.x | 需配合Actuator 2.7+ |
| Micrometer | 1.10.x+ | 支持Prometheus 0.12+ |
| Prometheus | 2.44.x+ | 需启用HTTP/2支持 |
<!-- Maven配置示例 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-actuator</artifactId></dependency><dependency><groupId>io.micrometer</groupId><artifactId>micrometer-registry-prometheus</artifactId></dependency>
# application.ymlmanagement:endpoints:web:exposure:include: prometheus,health,metricsmetrics:export:prometheus:enabled: truetags:application: ${spring.application.name}
@Beanpublic MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {return registry -> registry.config().commonTags("region", "us-east-1");}@RestControllerpublic class OrderController {private final Counter orderCounter;public OrderController(MeterRegistry registry) {this.orderCounter = registry.counter("orders.created.total");}@PostMapping("/orders")public String createOrder() {orderCounter.increment();// 业务逻辑...}}
# prometheus.ymlscrape_configs:- job_name: 'springboot-app'metrics_path: '/actuator/prometheus'static_configs:- targets: ['app1:8080', 'app2:8080']relabel_configs:- source_labels: [__address__]target_label: instance
scrape_interval设置不同应用的采集频率metric_relabel_configs过滤无关指标
# alertmanager.ymlroute:receiver: 'slack-notification'group_by: ['alertname', 'cluster']group_wait: 30srepeat_interval: 4hreceivers:- name: 'slack-notification'slack_configs:- api_url: 'https://hooks.slack.com/services/...'channel: '#alerts'text: '{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}'
# alerts.ymlgroups:- name: springboot.rulesrules:- alert: HighErrorRateexpr: rate(http_server_requests_seconds_count{status="5xx"}[5m]) > 10for: 2mlabels:severity: criticalannotations:summary: "High 5xx error rate on {{ $labels.instance }}"description: "5xx errors are {{ $value }} req/s"
http_config调用内部APIhttp_server_requests_seconds_bucket分析响应时间分布jvm_memory_used_bytes判断内存泄漏process_cpu_seconds_total定位CPU瓶颈logback_events_total追踪日志错误模式process_cpu_usage和jvm_memory_used_bytes预测扩容节点http_server_requests_seconds_count计算QPS上限tomcat_sessions_active评估会话容量
# 环境隔离配置global:external_labels:environment: productionscrape_configs:- job_name: 'springboot-prod'static_configs:- targets: ['prod-app:8080']labels:env: production
TracingMeterFilter关联监控与追踪数据predict_linear函数预测指标趋势通过Prometheus监控SpringBoot应用,开发者可以获得从基础设施到业务层的全方位洞察。本文介绍的方案已在多个生产环境验证,能够帮助团队:
未来发展方向包括:
建议开发者从核心指标监控入手,逐步完善监控体系,最终实现可观测性驱动的运维模式转型。