简介:本文详细介绍如何使用Prometheus监控SpringBoot服务,涵盖依赖配置、指标暴露、告警规则设置及Grafana可视化,助力开发者构建高效监控体系。
在分布式系统与微服务架构盛行的今天,服务监控的必要性已无需赘述。SpringBoot作为主流的Java微服务框架,其默认的Actuator模块虽能提供基础健康检查与指标,但功能有限且缺乏集中管理能力。Prometheus作为CNCF(云原生计算基金会)的明星项目,凭借其多维度数据模型、灵活的查询语言PromQL及强大的告警系统,成为监控SpringBoot服务的理想选择。
Prometheus的优势体现在:
在SpringBoot项目的pom.xml中引入以下依赖:
<!-- Prometheus客户端 --><dependency><groupId>io.micrometer</groupId><artifactId>micrometer-registry-prometheus</artifactId><version>1.12.0</version></dependency><!-- SpringBoot Actuator(可选,用于健康检查) --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-actuator</artifactId></dependency>
Micrometer是SpringBoot官方推荐的指标抽象层,支持多种监控系统(包括Prometheus),通过它可统一暴露指标。
在application.yml中启用Prometheus端点并配置路径:
management:endpoints:web:exposure:include: prometheus,health # 暴露/actuator/prometheus和/actuator/healthmetrics:export:prometheus:enabled: true # 启用Prometheus格式输出
启动服务后,访问http://localhost:8080/actuator/prometheus即可看到以# HELP和# TYPE开头的指标数据,例如:
# HELP http_server_requests_seconds 请求耗时(秒)# TYPE http_server_requests_seconds histogramhttp_server_requests_seconds_count{method="GET",status="200",uri="/api/users"} 10http_server_requests_seconds_sum{method="GET",status="200",uri="/api/users"} 2.5
从官网下载对应系统的二进制包,解压后编辑prometheus.yml:
global:scrape_interval: 15s # 全局抓取间隔scrape_configs:- job_name: 'springboot-service'metrics_path: '/actuator/prometheus'static_configs:- targets: ['localhost:8080'] # 替换为实际服务地址
启动Prometheus:
./prometheus --config.file=prometheus.yml
访问http://localhost:9090,在“Targets”页面可查看抓取状态。
Prometheus抓取的SpringBoot指标可分为四类:
jvm_memory_used_bytes(堆内存使用量)、jvm_threads_live(线程数)。http_server_requests_seconds(请求耗时分布)、http_server_requests_count(请求量)。自定义业务指标:通过MeterRegistry注册,例如记录订单处理时间:
@Beanpublic MeterRegistry meterRegistry() {return new SimpleMeterRegistry();}@RestControllerpublic class OrderController {private final Timer orderTimer;public OrderController(MeterRegistry registry) {this.orderTimer = registry.timer("order.processing.time");}@PostMapping("/orders")public String createOrder() {orderTimer.record(() -> {// 模拟业务处理try { Thread.sleep(100); } catch (InterruptedException e) {}});return "success";}}
process_cpu_seconds_total(CPU使用时间)、process_uptime_seconds(运行时长)。在Prometheus的alert.rules.yml中定义规则(需与prometheus.yml同目录):
groups:- name: springboot-alertsrules:- alert: HighRequestLatencyexpr: http_server_requests_seconds_count{uri="/api/users"} > 0and rate(http_server_requests_seconds_sum{uri="/api/users"}[1m]) /rate(http_server_requests_seconds_count{uri="/api/users"}[1m]) > 0.5for: 2mlabels:severity: warningannotations:summary: "高请求延迟: {{ $labels.uri }}"description: "平均耗时超过500ms,当前值: {{ $value }}s"
规则逻辑:当/api/users接口的1分钟平均耗时超过500ms且持续2分钟时触发告警。
下载Alertmanager并配置alertmanager.yml:
route:receiver: emailgroup_by: ['alertname']receivers:- name: emailemail_configs:- to: 'team@example.com'from: 'alert@example.com'smarthost: smtp.example.com:587auth_username: 'user'auth_password: 'pass'
启动Alertmanager:
./alertmanager --config.file=alertmanager.yml
在Prometheus配置中引用Alertmanager:
# prometheus.ymlalerting:alertmanagers:- static_configs:- targets: ['localhost:9093']
从官网下载并启动:
sudo apt-get install -y grafanasudo systemctl start grafana-server
访问http://localhost:3000(默认账号/密码:admin/admin)。
在Grafana的“Configuration”→“Data Sources”中添加Prometheus,URL填写http://localhost:9090。
推荐使用现成的SpringBoot仪表盘模板(如ID:4701),或手动创建面板:
rate(http_server_requests_seconds_count{uri="/api/users"}[5m])sum(rate(http_server_requests_seconds_count{status="500"}[5m])) /
sum(rate(http_server_requests_seconds_count[5m])) * 100(jvm_memory_used_bytes{area="heap"} /
jvm_memory_max_bytes{area="heap"}) * 100通过MeterRegistry注册更复杂的指标,例如记录订单状态分布:
@Beanpublic MeterRegistry meterRegistry() {SimpleMeterRegistry registry = new SimpleMeterRegistry();registry.gauge("order.status.count", Tags.of("status", "PENDING"), 0);registry.gauge("order.status.count", Tags.of("status", "COMPLETED"), 0);return registry;}// 更新指标@Servicepublic class OrderService {private final Gauge pendingGauge;private final Gauge completedGauge;public OrderService(MeterRegistry registry) {this.pendingGauge = registry.gauge("order.status.count", Tags.of("status", "PENDING"), 0);this.completedGauge = registry.gauge("order.status.count", Tags.of("status", "COMPLETED"), 0);}public void completeOrder(Long orderId) {pendingGauge.set(pendingGauge.value() - 1);completedGauge.set(completedGauge.value() + 1);}}
service.name、instance.ip,便于跨服务聚合。{label="value"}筛选指标,例如:
http_server_requests_seconds_count{service="order-service",method="POST"}
/actuator/prometheus返回404。management.endpoints.web.exposure.include是否包含prometheus,并确认依赖版本兼容。Down。scrape_timeout(默认10s)以适应慢响应服务。
http_server_requests_seconds_count{uri="/api/users"} > 5and rate(http_server_requests_seconds_sum{uri="/api/users"}[1m]) /rate(http_server_requests_seconds_count{uri="/api/users"}[1m]) > 0.5
file_sd_config实现服务发现,避免手动维护目标列表。histogram_quantile函数计算P99耗时,指导扩容决策。prometheus.yml中配置retention.time(如30d),防止磁盘溢出。通过以上步骤,开发者可快速搭建一套覆盖SpringBoot服务全生命周期的监控系统,实现从代码级性能分析到集群级容量管理的全面掌控。