当前位置：首页 > news >正文

Prometheus + Grafana 从采集到告警的完整实战（Go + Kind）

news 2026/3/26 15:32:18

目录

环境准备：部署Kind与监控套件

1. 创建Kind集群

2. 部署Prometheus Operator栈

Go应用埋点与容器化

1. 代码埋点

2. 编写Dockerfile

3. 构建镜像并推送（或本地加载到Kind）

Kubernetes部署与服务发现

1. 部署Deployment和Service

2. 配置ServiceMonitor

Grafana可视化大盘配置

告警配置：两种主流方式

方式一：Prometheus Alertmanager

方式二：Grafana原生告警（9.x+）

验证与测试

清理

本文涵盖本地Kind集群环境搭建、Go应用埋点、Prometheus采集、Grafana可视化，以及Prometheus Alertmanager和Grafana原生告警两种配置方式。

环境准备：部署Kind与监控套件

1. 创建Kind集群

创建Kind配置文件，通过额外端口映射暴露服务：

# kind-config.yaml kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane extraPortMappings: - containerPort: 30000 # Prometheus NodePort hostPort: 9090 - containerPort: 31000 # Grafana NodePort hostPort: 3000 - containerPort: 32000 # Alertmanager NodePort hostPort: 9093

执行命令创建集群：

kind create cluster --config kind-config.yaml

2. 部署Prometheus Operator栈

使用kube-prometheus-stack一键部署Prometheus、Grafana和Alertmanager：

# 添加Helm仓库 helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # 创建监控命名空间 kubectl create namespace monitoring # 安装监控栈，指定NodePort方便本地访问 helm install kind-prometheus prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.service.nodePort=30000 \ --set prometheus.service.type=NodePort \ --set grafana.service.nodePort=31000 \ --set grafana.service.type=NodePort \ --set alertmanager.service.nodePort=32000 \ --set alertmanager.service.type=NodePort

安装完成后，可通过http://localhost:9090访问Prometheus，http://localhost:3000访问Grafana（默认账号admin/prom-operator）。

Go应用埋点与容器化

1. 代码埋点

在Go应用中引入Prometheus客户端库，暴露/metrics端点：

package main import ( "net/http" "github.com/prometheus/client_golang/prometheus/promhttp" "github.com/prometheus/client_golang/prometheus" "math/rand" "time" ) // 定义自定义指标 var ( httpRequestsTotal = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests", }, []string{"path"}, ) httpRequestDuration = prometheus.NewHistogramVec( prometheus.HistogramOpts{ Name: "http_request_duration_seconds", Help: "HTTP request duration in seconds", Buckets: prometheus.DefBuckets, }, []string{"path"}, ) ) func init() { // 注册指标 prometheus.MustRegister(httpRequestsTotal) prometheus.MustRegister(httpRequestDuration) } func main() { // 业务Handler示例 http.HandleFunc("/api/hello", func(w http.ResponseWriter, r *http.Request) { start := time.Now() path := "/api/hello" // 模拟业务处理耗时 time.Sleep(time.Duration(rand.Intn(200)) * time.Millisecond) httpRequestsTotal.WithLabelValues(path).Inc() httpRequestDuration.WithLabelValues(path).Observe(time.Since(start).Seconds()) w.Write([]byte("Hello, World!")) }) // 暴露Prometheus指标接口 http.Handle("/metrics", promhttp.Handler()) http.ListenAndServe(":8080", nil) }

关键点：通过promhttp.Handler()提供/metrics端点，Prometheus定期抓取该端点获取数据。

2. 编写Dockerfile

FROM golang:1.20-alpine AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . . RUN go build -o myapp main.go FROM alpine:latest RUN apk --no-cache add ca-certificates WORKDIR /root/ COPY --from=builder /app/myapp . EXPOSE 8080 CMD ["./myapp"]

3. 构建镜像并推送（或本地加载到Kind）

# 构建镜像 docker build -t go-demo-app:v1 . # 将镜像加载到Kind节点（本地测试无需仓库） kind load docker-image go-demo-app:v1

Kubernetes部署与服务发现

1. 部署Deployment和Service

# go-app.yaml apiVersion: apps/v1 kind: Deployment metadata: name: go-demo-app namespace: default labels: app: go-exporter spec: replicas: 2 selector: matchLabels: app: go-exporter template: metadata: labels: app: go-exporter spec: containers: - name: app image: go-demo-app:v1 imagePullPolicy: IfNotPresent ports: - containerPort: 8080 name: metrics --- apiVersion: v1 kind: Service metadata: name: go-demo-app-service namespace: default labels: app: go-exporter spec: selector: app: go-exporter ports: - name: metrics port: 8080 targetPort: 8080

kubectl apply -f go-app.yaml

2. 配置ServiceMonitor

Prometheus Operator通过ServiceMonitor自动发现目标服务：

# service-monitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: go-demo-app-monitor namespace: default spec: selector: matchLabels: app: go-exporter endpoints: - port: metrics interval: 15s path: /metrics namespaceSelector: any: true

注意：ServiceMonitor的selector需匹配Service的标签，Prometheus会在对应命名空间发现Endpoint 。

Grafana可视化大盘配置

登录Grafana：访问http://localhost:3000，使用admin/prom-operator登录。
添加数据源：Configuration → Data Sources → Add data source，选择Prometheus，URL填http://kind-prometheus-kube-prome-prometheus.monitoring:9090（集群内Service地址）。
导入Go应用大盘：Dashboards → Import，输入Dashboard ID6671（Go应用通用大盘），选择数据源后加载。
自定义面板：可添加Panel查询自定义指标，例如：
- QPS：rate(http_requests_total[1m])
- 延迟：histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, path))

告警配置：两种主流方式

方式一：Prometheus Alertmanager

创建告警规则（在Prometheus配置中）：

# prometheus-adapter.yaml (通过values覆盖或额外配置) groups: - name: go-app-alerts rules: - alert: HighRequestLatency expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[2m])) by (le, path)) > 0.5 for: 1m labels: severity: critical annotations: summary: "High latency on {{ $labels.path }}" description: "95th percentile latency is {{ $value }}s"

通过Helm升级时，可将规则配置在additionalPrometheusRulesMap中。

配置Alertmanager（通过Secret管理）：

# alertmanager-config.yaml apiVersion: v1 kind: Secret metadata: name: alertmanager-kind-prometheus-kube-prome-alertmanager namespace: monitoring stringData: alertmanager.yaml: | global: slack_api_url: 'https://hooks.slack.com/services/xxx' route: group_by: ['alertname'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'slack-notifications' receivers: - name: 'slack-notifications' slack_configs: - channel: '#alerts' title: 'Prometheus Alert'

方式二：Grafana原生告警（9.x+）

导航到告警：Alerting → Alert rules → New alert rule。
配置查询和条件：
- 查询：histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[2m])) by (le))
- 表达式：B > 0.5（设定阈值）
评估行为：每1m评估一次，For时长1m。
添加标签和注解：如severity: warning。
设置通知策略：
- 创建Contact point（如钉钉、Webhook）。
- 在Notification policies中关联标签和Contact point。
- Grafana原生告警支持更灵活的UI配置，适合快速迭代。

验证与测试

触发告警：对Go应用压测，使延迟超过0.5s。

# 模拟请求 while true; do curl http://localhost:8080/api/hello; done

查看告警：
- Alertmanager UI：http://localhost:9093
- Grafana Alerting：Alerting → State history
查看指标：Prometheus UI（http://localhost:9090）查询http_request_duration_seconds_bucket。

清理

kind delete cluster

通过以上步骤，你已实现从Go应用埋点、Kind部署、Prometheus采集、Grafana可视化到告警通知的完整监控闭环。根据需求可扩展更多Exporter（如Node Exporter）或集成Loki日志告警。

http://www.jsqmd.com/news/454364/

相关文章：

[第十六届蓝桥杯/java]3.最短距离

7.2 中间件（LangChain 内置中间件）

新洋港潮汐表查询2026-03-09

音视频技术迭代下EasyDSS直播点播视频会议能力的发展方向与价值升级

NPU 算力调度内核深度解析

【声呐技术】基于声纳的水下机器人深度学习：概述、鲁棒性与挑战

扫描电镜和透射电镜的区别

WebRTC/语音转文字STT/AI语言大模型重构EasyDSS视频会议

SqlSession was not registered for synchronization because synchronization is not active

GLM-4.7-Flash模型在FP16精度下部署需求

Flutter 三方库 fftea 的鸿蒙化适配指南 - 打造极致性能的文本扩展加密、助力鸿蒙端敏感数据安全传输

AI时代：人和人之间的差距被放大，AI不能弥补你的短板

qBittorrent实用教程：从入门到精通

打卡信奥刷题（2925）用C++实现信奥题P5627 P5662 [CSP-J 2019] 纪念品

Zoom视频会议断线卡顿SD-WAN技术：解决办法大揭秘！

赴美物流不踩坑：优质美国货代公司推荐+实操干货，新手也能选对 - 品牌评测官

万爱通礼品卡回收靠谱吗？分析线上平台的回收优势 - 团团收购物卡回收

毕业论文神器！人气爆表的一键生成论文工具 —— 千笔写作工具

2026年打工人福音！萌新华为云上及本地部署OpenClaw(Clawdbot) 集成T钉钉保姆级步骤

2026全国知名的SSL证书品牌推荐：速安信，高性价比的国产SSL证书之选 - 麦麦唛

为什么优秀程序员总在拆函数？因为代码应该表达意图，而不是实现

2026交通执法5g执法记录仪选购推荐榜：高清执法记录仪、高清红外执法记录仪、4g执法记录仪、4g智能安全帽选择指南 - 优质品牌商家

将 DeepSeek 模型接入 Claude Code

2000-2024年上市公司资产专用性数据（三种测度）+Stata代码

Kubenets集群安装记录02

Nature 正刊：可个性化适配所有左心耳类型的磁流体机器人

Abaqus中利用USDFLD子程序在TIG焊接降温阶段改变材料参数及高斯热源DFlux联合仿...

OpenClaw + Claude Code 超强教程：一个人就能搭建完整的开发团队

2026年打工人必备Skill！新手华为云上及本地部署OpenClaw(Clawdbot) 集成小红书保姆级步骤

Java 中线程之间如何进行通信？