当前位置：首页 > news >正文

yz-bijini-cosplay模型监控：Prometheus+Grafana实践

news 2026/4/10 10:28:41

yz-bijini-cosplay模型监控：Prometheus+Grafana实践

1. 为什么需要监控AI模型服务

当你运行yz-bijini-cosplay这样的AI模型服务时，最头疼的问题可能就是：服务突然变慢了你不知道，请求失败了你不清楚原因，资源用完了你也没及时发现。这就好比开车没有仪表盘，完全凭感觉在跑，风险很大。

监控系统就是你的"仪表盘"，它能告诉你：

服务现在健康吗？有没有挂掉
处理请求的速度正常吗？有没有变慢
资源使用情况怎么样？内存、GPU够用吗
有多少人在用？负载高不高

有了Prometheus和Grafana，你就能实时掌握这些信息，提前发现问题，避免服务中断。

2. 监控系统整体架构

先来看看我们要搭建的监控系统长什么样：

yz-bijini-cosplay服务 → Prometheus指标采集 → Grafana可视化展示 ↓ ↓ 业务指标 系统指标 （请求数、延迟等） （CPU、内存等）

简单说就是：Prometheus负责收集数据，Grafana负责展示数据。两者配合，给你一个完整的监控视图。

3. 环境准备与安装

3.1 安装Prometheus

首先下载并安装Prometheus：

# 创建监控专用目录 mkdir -p ~/monitoring && cd ~/monitoring # 下载Prometheus wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz # 解压 tar xvfz prometheus-2.47.0.linux-amd64.tar.gz cd prometheus-2.47.0.linux-amd64 # 启动Prometheus（后台运行） nohup ./prometheus --config.file=prometheus.yml &

检查是否启动成功：

curl http://localhost:9090 # 如果返回HTML页面，说明启动成功

3.2 安装Grafana

接下来安装Grafana：

# 下载并安装Grafana wget https://dl.grafana.com/oss/release/grafana-10.0.0.linux-amd64.tar.gz tar xvfz grafana-10.0.0.linux-amd64.tar.gz cd grafana-10.0.0 # 启动Grafana（后台运行） nohup ./bin/grafana-server web &

Grafana默认运行在3000端口，用浏览器打开http://你的服务器IP:3000，默认账号密码都是admin。

4. 配置yz-bijini-cosplay指标采集

现在要让Prometheus能够采集yz-bijini-cosplay服务的指标。

4.1 暴露模型服务指标

yz-bijini-cosplay服务需要暴露监控指标。如果你用的是标准web框架，可以添加监控中间件：

# 示例：为Flask应用添加监控 from prometheus_flask_exporter import PrometheusMetrics app = Flask(__name__) metrics = PrometheusMetrics(app) # 添加自定义指标 request_count = metrics.counter( 'model_requests_total', 'Total model requests', labels={'model': 'yz-bijini-cosplay'} )

这样服务就会在/metrics端点暴露监控数据。

4.2 配置Prometheus采集

修改Prometheus配置文件prometheus.yml：

scrape_configs: - job_name: 'yz-bijini-cosplay' static_configs: - targets: ['localhost:5000'] # 你的模型服务地址 metrics_path: '/metrics' scrape_interval: 15s # 每15秒采集一次

重启Prometheus使配置生效：

pkill prometheus nohup ./prometheus --config.file=prometheus.yml &

5. 关键监控指标详解

监控yz-bijini-cosplay服务，主要关注这几类指标：

5.1 业务性能指标

请求量：每秒处理多少请求（QPS）
响应时间：处理每个请求要多久
错误率：有多少请求失败了

5.2 资源使用指标

GPU使用率：模型推理主要用GPU
内存使用：别让内存爆了
CPU使用：虽然主要用GPU，但CPU也很重要

5.3 服务质量指标

服务可用性：服务是不是正常响应
并发连接数：同时有多少人在用

6. Grafana仪表盘配置

现在来创建一个漂亮的监控面板。

6.1 添加数据源

在Grafana界面中：

点击左侧齿轮图标 → Data Sources
选择Prometheus
URL填写http://localhost:9090
点击Save & Test，显示绿色成功提示

6.2 创建监控仪表盘

新建一个Dashboard，添加这些面板：

请求量监控面板：

PromQL查询：rate(model_requests_total[1m])
可视化类型：Graph
标题：每秒请求数（QPS）

响应时间面板：

PromQL查询：rate(model_request_duration_seconds_sum[1m]) / rate(model_request_duration_seconds_count[1m])
可视化类型：Stat
标题：平均响应时间

错误率面板：

PromQL查询：rate(model_requests_total{status="500"}[1m]) / rate(model_requests_total[1m])
可视化类型：Gauge
标题：错误率

7. 设置告警规则

监控不能只靠人盯着看，要设置自动告警。

7.1 Prometheus告警配置

在Prometheus配置中添加告警规则：

rule_files: - alerts.yml

创建alerts.yml：

groups: - name: model-alerts rules: - alert: HighErrorRate expr: rate(model_requests_total{status="500"}[5m]) / rate(model_requests_total[5m]) > 0.05 for: 2m labels: severity: warning annotations: summary: "错误率过高" description: "yz-bijini-cosplay服务错误率超过5%" - alert: HighResponseTime expr: avg_over_time(model_request_duration_seconds[5m]) > 2 for: 3m labels: severity: warning annotations: summary: "响应时间过长" description: "平均响应时间超过2秒"