简介:本文围绕Prometheus监控系统展开,从基础安装到Exporter编写提供全流程指导,包含核心概念解析、配置实践和代码实现示例,帮助开发者快速构建监控体系。
Prometheus作为CNCF(云原生计算基金会)毕业项目,已成为容器化环境监控的事实标准。其核心设计理念包含:
<metric_name>{<label_name>=<label_value>, ...}格式典型监控场景涵盖:
推荐使用Docker容器化部署:
docker run -d --name prometheus \-p 9090:9090 \-v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \prom/prometheus
关键配置文件解析(prometheus.yml):
global:scrape_interval: 15sevaluation_interval: 15sscrape_configs:- job_name: 'node_exporter'static_configs:- targets: ['192.168.1.100:9100']metrics_path: /metrics
prometheus.yml中添加rule_files指向告警规则文件basic_auth_users:
admin: $apr1$… # 使用htpasswd生成
## 三、Exporter开发核心原理### 1. Exporter工作机制Exporter本质是HTTP服务,需实现:- 符合Prometheus文本格式的指标暴露接口- 指标命名规范(`<prefix>_<subsystem>_<measurement>`)- 合理的标签设计(避免高基数标签)### 2. 开发环境准备推荐技术栈:- 语言:Go(官方推荐)、Python、Java- 框架:Prometheus Client Library- 测试工具:curl、Prometheus UIGo语言开发示例:```gopackage mainimport ("net/http""github.com/prometheus/client_golang/prometheus""github.com/prometheus/client_golang/prometheus/promhttp")var (requestCount = prometheus.NewCounterVec(prometheus.CounterOpts{Name: "http_requests_total",Help: "Total number of HTTP requests",},[]string{"method", "path"},)requestLatency = prometheus.NewHistogramVec(prometheus.HistogramOpts{Name: "http_request_duration_seconds",Help: "HTTP request latency distribution",Buckets: []float64{0.05, 0.1, 0.25, 0.5, 1.0},},[]string{"method"},))func init() {prometheus.MustRegister(requestCount)prometheus.MustRegister(requestLatency)}func recordMetrics(method, path string) {requestCount.WithLabelValues(method, path).Inc()}func main() {http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {recordMetrics(r.Method, r.URL.Path)w.Write([]byte("Hello, Prometheus!"))})http.Handle("/metrics", promhttp.Handler())http.ListenAndServe(":8080", nil)}
支持多种服务发现机制:
scrape_configs:- job_name: 'kubernetes-pods'kubernetes_sd_configs:- role: podrelabel_configs:- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]action: keepregex: true
示例告警规则:
groups:- name: node.rulesrules:- alert: NodeMemoryUsageexpr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90for: 5mlabels:severity: warningannotations:summary: "High memory usage on {{ $labels.instance }}"description: "Memory usage is above 90% (current value: {{ $value }})"
推荐方案:
scrape_duration_seconds确认采集耗时up{job="<job_name>"} == 1确认服务可用scrape_interval--web.enable-admin-api监控内部指标--web.external-url限制访问范围--web.route-prefix防止路径冲突通过系统掌握上述内容,开发者能够从零开始构建完整的Prometheus监控体系,并根据实际需求开发定制化的Exporter。建议结合官方文档和社区案例持续深化实践,逐步构建适合企业需求的监控解决方案。