当前位置: 首页 > news >正文

折腾笔记[40]-使用上古A100 GPU运行qwen3-30b-a3b模型

摘要

使用上古的A100-SXM4-40GB GPU通过ollama运行qwen3-30b-a3b模型.“30B-Q8 量化模型在 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”.

关键信息

  • 镜像: ollama/ollama:0.6.6-rc2
  • GPU: A100-SXM4-40GB
  • GPU驱动: NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2
  • docker: Docker version 24.0.4, build 3713ee1
  • 模型: modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf
  • 主机系统: Linux Tesla 5.10.0-60.18.0.50.oe2203.x86_64 #1 SMP Wed Mar 30 03:12:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

实现

1. 在docker(已配置gpu驱动)中配置ollama

docker pull ollama/ollama:0.6.6-rc2
docker run --restart=always --name ollama -v /lvm-group1/qsbye/ollama:/root/.ollama -p 11435:11434 -e "OLLAMA_HOST=0.0.0.0" -d ollama/ollama:0.6.6-rc2

2. ollama修改默认目录(防止系统盘太满)

## 一键更新系统的ollama(本质就是重新安装最新版)
curl -fsSL https://ollama.com/install.sh | sh## 更新完验证
ollama --version## 数据盘新建ollama数据目录
sudo mkdir -p /lvm-group1/qsbye/ollama
sudo chmod 777 -R /lvm-group1/qsbye/ollama
sudo cp /usr/share/ollama/.ollama/models /lvm-group1/qsbye/ollama## ollama修改默认目录
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo vim /etc/systemd/system/ollama.service.d/override.conf

内容:

[Service]
Environment="OLLAMA_MODELS=/lvm-group1/qsbye/ollama/models"
User=ollama
Group=ollama

然后:

sudo systemctl daemon-reload
sudo systemctl restart ollama
sudo systemctl status ollama

3. 下载模型

# 使用国内源(魔搭社区)
ollama pull modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf

3. 运行模型

docker exec -d -e OLLAMA_GPU_LAYERS=999 -e OLLAMA_KEEP_ALIVE=-1 -e CUDA_VISIBLE_DEVICES=0 ollama bash -c "ollama run modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"# 另开终端看显存
watch -n1 nvidia-smi

输出:

Every 1.0s: nvidia-smi                           Tesla: Sun Jan 18 08:59:50 2026Sun Jan 18 08:59:51 2026
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100-SXM4-40GB	   Off  | 00000000:82:00.0 Off |                    0 |
| N/A   39C    P0    43W / 400W |  36742MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   3751529	 C   /usr/bin/ollama                 36737MiB |
+-----------------------------------------------------------------------------+

4. Thinking问答测试

python -c "import requests,json,sys;[sys.stdout.write(json.loads(l)['response']) for l in requests.post('http://10.8.8.130:11435/api/generate',json={'model':'modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf','prompt':'一行python代码打印hello ollama;','stream':True},stream=True).iter_lines(decode_unicode=True) if l]"

输出:

<think>
好的,用户让我用一行Python代码打印“hello ollama”。这看起来挺简单的,但我要仔细想想有没有什么需要注意的地方。首先,Python中打印字符串的基本语法是print("内容")。所以最直接的方式就是print("hello ollama")。不过用户可能有其他需求吗?比如是否需要考虑大小写?不过例子中的“hello ollama”是小写的,所以应该没问题。有没有可能用户想用其他方法?比如使用变量或者转义字符?不过题目明确说是一行代码,所以应该直接使用print函数。另外,是否需要考虑Python版本?比如Python 2和3的区别,但现在的环境大多数是Python 3,所以没问题。还有可能用户想用更复杂的表达式,比如拼接字符串?比如print("hello" + " ollama"),但这样反而更复杂,不如直接写字符串简单。不过用户可能只是想确认基本用法,所以直接写最简单的形式最好。另外,检查是否有拼写错误,比如“ollama”是否正确?用户可能打错了,但按照问题描述,应该按照给出的字符串来处理。所以正确的代码应该是print("hello ollama")。有没有其他可能?比如使用格式化字符串,比如print(f"hello ollama"),但同样,这和直接写字符串没有区别,而且更复杂。所以还是直接使用print("hello ollama")最简洁。总结一下,用户的需求明确,只需要一行代码,所以直接使用print函数输出字符串即可。没有其他隐藏的要求,所以答案应该是这个。
</think>···python
print("hello ollama")
···

5. 打印token速率

python -c "
import requests, json, sys, time, datetime as dturl  = 'http://10.8.8.130:11435/api/generate'
payload = {'model': 'modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf','prompt': '一行python代码打印hello ollama;','stream': True
}start = dt.datetime.now()
try:r = requests.post(url, json=payload, stream=True, timeout=30)for line in r.iter_lines(decode_unicode=True):if not line:continuechunk = json.loads(line)sys.stdout.write(chunk.get('response', ''))sys.stdout.flush()# 实时 token/scnt = chunk.get('eval_count', 0)dur_ns = chunk.get('eval_duration', 0)if dur_ns:rate = cnt / (dur_ns / 1e9)sys.stdout.write(f'\r[%.1f token/s]     ' % rate)sys.stdout.flush()
except Exception as e:print('\nError:', e, file=sys.stderr)
"

输出:

[16.0 token/s]

6. 保证ollama显存不被回收

  1. 设置环境变量OLLAMA_KEEP_ALIVE=-1
  2. 每隔3分钟就调用一次模型(心跳)
## 如果没有装go编译器
pip install go-bin -i https://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
## go代码
vim ollama_heartbeat.go
go build ollama_heartbeat.go
chmod +x ollama_heartbeat
nohup ./ollama_heartbeat &
## 查看输出
tail nohup.out

代码:

// ollama_heartbeat.go
package mainimport ("bufio""bytes""encoding/json""fmt""io""net/http""os""time"
)const defaultHost = "http://127.0.0.1:11435"
const defaultModel = "modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"func once() {host := os.Getenv("OLLAMA_HOST")if host == "" {host = defaultHost}model := os.Getenv("OLLAMA_MODEL")if model == "" {model = defaultModel}body, _ := json.Marshal(map[string]interface{}{"model":  model,"prompt": "你好","stream": true,})req, err := http.NewRequest("POST", host+"/api/generate", bytes.NewReader(body))if err != nil {fmt.Printf("[%s] heartbeat fail: %v\n", time.Now().Format("01-02 15:04:05"), err)return}req.Header.Set("Content-Type", "application/json")client := &http.Client{Timeout: 30 * time.Second}resp, err := client.Do(req)if err != nil {fmt.Printf("[%s] heartbeat fail: %v\n", time.Now().Format("01-02 15:04:05"), err)return}defer resp.Body.Close()if resp.StatusCode != http.StatusOK {fmt.Printf("[%s] heartbeat fail: status=%d\n", time.Now().Format("01-02 15:04:05"), resp.StatusCode)return}// 流式读取,累加字节数reader := bufio.NewReader(resp.Body)total := 0for {line, err := reader.ReadBytes('\n')if err == io.EOF {break}if err != nil {fmt.Printf("[%s] heartbeat fail while reading: %v\n", time.Now().Format("01-02 15:04:05"), err)return}total += len(line)}fmt.Printf("[%s] heartbeat ok, %d bytes\n", time.Now().Format("01-02 15:04:05"), total)
}func main() {for {once()time.Sleep(3 * time.Minute)}
}

输出:

[01-18 10:17:50] heartbeat ok, 17346 bytes
[01-18 10:20:59] heartbeat ok, 18120 bytes

7. 观察问答时的功率波动及单次问答token总量及能量消耗

代码:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
实时采集 Ollama 推理期间的 GPU 功率,统计 token 总量与能耗,
并保存为 CSV 后绘图输出 JPG。
"""import subprocess
import csv
import time
import datetime as dt
import requests
import sys
from PIL import Image, ImageDraw, ImageFont# -------------------- 参数 --------------------
URL = "http://10.8.8.130:11435/api/generate"
MODEL = "modelscope.cn/Qwen/Qwen3-30B-A3B-GGUF:Qwen3-30B-A3B-Q8_0.gguf"
PROMPT = "Please introduce yourself in one sentence."# 时间戳
ts = dt.datetime.now().strftime("%Y%m%d_%H%M%S")
csv_file = f"ollama_statistics_{ts}.csv"
jpg_file = f"ollama_statistics_{ts}.jpg"# -------------------- 功率采样 --------------------
def get_gpu_power():"""返回当前 GPU 功耗(W)"""out = subprocess.check_output(["nvidia-smi", "--query-gpu=power.draw", "--format=csv,noheader,nounits"],text=True,)return float(out.strip())# -------------------- 推理 + 采样 --------------------
power_samples = []  # [(timestamp, power_W), ...]
total_tokens = 0payload = {"model": MODEL,"prompt": PROMPT,"stream": True,
}print("Starting inference and power sampling...")
start_time = dt.datetime.now()# 推理前采样 50 次
for _ in range(50):power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))time.sleep(0.1)# 流式推理
try:resp = requests.post(URL, json=payload, stream=True, timeout=60)for line in resp.iter_lines(decode_unicode=True):if not line:continuechunk = line.strip()# 简单计 tokentotal_tokens += 1# 采样power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))time.sleep(0.01)
except Exception as e:print("Inference error:", e, file=sys.stderr)# 推理后再采样 50 次
for _ in range(50):power_samples.append((dt.datetime.now().isoformat(timespec="milliseconds"), get_gpu_power()))time.sleep(0.1)elapsed = (dt.datetime.now() - start_time).total_seconds()
avg_power = sum(p[1] for p in power_samples) / len(power_samples)
energy_wh = avg_power * elapsed / 3600  # Wh# -------------------- 保存 CSV --------------------
with open(csv_file, "w", newline="") as f:writer = csv.writer(f)writer.writerow(["timestamp", "power_W"])writer.writerows(power_samples)writer.writerow([])writer.writerow(["total_tokens", total_tokens])writer.writerow(["elapsed_s", elapsed])writer.writerow(["avg_power_W", avg_power])writer.writerow(["energy_Wh", energy_wh])
print(f"Saved {csv_file}")# -------------------- 绘图 --------------------
W, H = 800, 400
img = Image.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(img)powers = [p[1] for p in power_samples]
times  = [p[0] for p in power_samples]# 坐标轴范围
margin = 60
x0, y0 = margin, margin
x1, y1 = W - margin, H - marginmin_p, max_p = min(powers), max(powers)
pad = (max_p - min_p) * 0.1
min_p, max_p = min_p - pad, max_p + pad# 折线坐标
coords = [(x0 + (i / (len(powers) - 1)) * (x1 - x0),y1 - ((p - min_p) / (max_p - min_p)) * (y1 - y0),)for i, p in enumerate(powers)
]# 边框
draw.rectangle([x0, y0, x1, y1], outline="black")# 折线
for i in range(len(coords) - 1):draw.line([coords[i], coords[i + 1]], fill="blue", width=2)# 标题
title = f"Ollama Power Sampling  tokens={total_tokens}  energy={energy_wh:.2f} Wh"
draw.text((W // 2, 10), title, fill="black", anchor="mt")# 轴标签
draw.text((x0, y0 - 10), f"{max_p:.1f} W", fill="black", anchor="lt")
draw.text((x0, y1 + 5), f"{min_p:.1f} W", fill="black", anchor="lt")
draw.text((x0 - 5, y1), times[0][-8:], fill="black", anchor="rt")
draw.text((x1, y1), times[-1][-8:], fill="black", anchor="lt")img.save(jpg_file)
print(f"Plotted {jpg_file}")

查看数据:

python -m http.server 8888
sudo firewall-cmd --permanent --add-port=8888/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --list-all

访问: [http://10.8.8.130:8888].

数据:

功率
ollama_statistics_20260118_104449
timestamp,power_W
2026-01-18T10:44:49.482,48.52
2026-01-18T10:44:49.594,50.4
2026-01-18T10:44:49.713,48.52
2026-01-18T10:44:49.823,48.52
2026-01-18T10:44:49.934,48.52
2026-01-18T10:44:50.044,48.52
2026-01-18T10:44:50.157,48.52
2026-01-18T10:44:50.267,48.52
2026-01-18T10:44:50.378,48.52
2026-01-18T10:44:50.489,48.52
2026-01-18T10:44:50.600,48.52
2026-01-18T10:44:50.710,50.37
2026-01-18T10:44:50.834,48.52
2026-01-18T10:44:50.944,48.52
2026-01-18T10:44:51.055,48.52
2026-01-18T10:44:51.167,48.52
2026-01-18T10:44:51.277,48.52
2026-01-18T10:44:51.388,48.52
2026-01-18T10:44:51.499,48.52
2026-01-18T10:44:51.610,48.52
2026-01-18T10:44:51.720,48.52
2026-01-18T10:44:51.832,50.37
2026-01-18T10:44:51.957,48.52
2026-01-18T10:44:52.068,48.52
2026-01-18T10:44:52.178,48.52
2026-01-18T10:44:52.289,48.52
2026-01-18T10:44:52.399,48.52
2026-01-18T10:44:52.509,48.52
2026-01-18T10:44:52.620,48.52
2026-01-18T10:44:52.730,48.52
2026-01-18T10:44:52.842,48.52
2026-01-18T10:44:52.952,50.37
2026-01-18T10:44:53.078,49.03
2026-01-18T10:44:53.189,48.52
2026-01-18T10:44:53.299,48.52
2026-01-18T10:44:53.409,48.52
2026-01-18T10:44:53.519,48.52
2026-01-18T10:44:53.630,48.52
2026-01-18T10:44:53.740,48.52
2026-01-18T10:44:53.849,48.52
2026-01-18T10:44:53.959,48.52
2026-01-18T10:44:54.070,50.4
2026-01-18T10:44:54.198,49.03
2026-01-18T10:44:54.308,48.52
2026-01-18T10:44:54.420,48.52
2026-01-18T10:44:54.530,48.52
2026-01-18T10:44:54.640,48.52
2026-01-18T10:44:54.750,48.52
2026-01-18T10:44:54.859,48.52
2026-01-18T10:44:54.969,48.52
2026-01-18T10:44:55.185,50.37
2026-01-18T10:44:55.247,65.36
2026-01-18T10:44:55.308,65.36
2026-01-18T10:44:55.368,57.65
2026-01-18T10:44:55.427,51.72
2026-01-18T10:44:55.488,51.72
2026-01-18T10:44:55.548,62.63
2026-01-18T10:44:55.608,62.63
2026-01-18T10:44:55.668,60.33
2026-01-18T10:44:55.728,51.3
2026-01-18T10:44:55.788,51.3
2026-01-18T10:44:55.848,62.21
2026-01-18T10:44:55.908,62.21
2026-01-18T10:44:55.975,61.27
2026-01-18T10:44:56.035,52.63
2026-01-18T10:44:56.095,52.63
2026-01-18T10:44:56.164,66.71
2026-01-18T10:44:56.225,50.79
2026-01-18T10:44:56.315,50.37
2026-01-18T10:44:56.406,50.37
2026-01-18T10:44:56.478,57.65
2026-01-18T10:44:56.544,52.63
2026-01-18T10:44:56.606,52.63
2026-01-18T10:44:56.666,66.29
2026-01-18T10:44:56.726,66.29
2026-01-18T10:44:56.786,50.79
2026-01-18T10:44:56.848,54.42
2026-01-18T10:44:56.908,54.42
2026-01-18T10:44:56.968,68.59
2026-01-18T10:44:57.028,68.59
2026-01-18T10:44:57.088,53.05
2026-01-18T10:44:57.148,54.42
2026-01-18T10:44:57.206,54.42
2026-01-18T10:44:57.277,69.48
2026-01-18T10:44:57.338,51.72
2026-01-18T10:44:57.399,59.91
2026-01-18T10:44:57.490,59.91
2026-01-18T10:44:57.521,59.91
2026-01-18T10:44:57.589,58.07
2026-01-18T10:44:57.651,53.48
2026-01-18T10:44:57.711,53.48
2026-01-18T10:44:57.777,69.01
2026-01-18T10:44:57.843,51.3
2026-01-18T10:44:57.912,51.3
2026-01-18T10:44:57.973,68.59
2026-01-18T10:44:58.033,68.59
2026-01-18T10:44:58.108,50.79
2026-01-18T10:44:58.184,68.08
2026-01-18T10:44:58.244,51.3
2026-01-18T10:44:58.306,51.3
2026-01-18T10:44:58.375,59.44
2026-01-18T10:44:58.439,59.44
2026-01-18T10:44:58.500,55.35
2026-01-18T10:44:58.620,55.35
2026-01-18T10:44:58.642,55.35
2026-01-18T10:44:58.681,67.23
2026-01-18T10:44:58.743,67.23
2026-01-18T10:44:58.804,53.98
2026-01-18T10:44:58.867,53.05
2026-01-18T10:44:58.948,53.05
2026-01-18T10:44:59.027,49.0
2026-01-18T10:44:59.102,65.87
2026-01-18T10:44:59.170,53.09
2026-01-18T10:44:59.242,53.09
2026-01-18T10:44:59.333,54.93
2026-01-18T10:44:59.404,64.42
2026-01-18T10:44:59.468,52.16
2026-01-18T10:44:59.537,52.16
2026-01-18T10:44:59.606,63.57
2026-01-18T10:44:59.669,53.05
2026-01-18T10:44:59.754,53.05
2026-01-18T10:44:59.791,63.57
2026-01-18T10:44:59.853,63.57
2026-01-18T10:44:59.922,49.42
2026-01-18T10:44:59.983,56.28
2026-01-18T10:45:00.046,56.28
2026-01-18T10:45:00.107,63.57
2026-01-18T10:45:00.170,50.79
2026-01-18T10:45:00.231,50.79
2026-01-18T10:45:00.291,60.37
2026-01-18T10:45:00.351,60.37
2026-01-18T10:45:00.412,62.63
2026-01-18T10:45:00.471,51.3
2026-01-18T10:45:00.533,51.3
2026-01-18T10:45:00.593,59.95
2026-01-18T10:45:00.653,59.95
2026-01-18T10:45:00.714,62.63
2026-01-18T10:45:00.774,52.63
2026-01-18T10:45:00.877,62.67
2026-01-18T10:45:00.901,62.67
2026-01-18T10:45:00.960,62.67
2026-01-18T10:45:01.020,60.77
2026-01-18T10:45:01.081,51.72
2026-01-18T10:45:01.157,51.72
2026-01-18T10:45:01.221,61.27
2026-01-18T10:45:01.283,51.72
2026-01-18T10:45:01.343,51.72
2026-01-18T10:45:01.402,61.74
2026-01-18T10:45:01.469,61.74
2026-01-18T10:45:01.545,49.84
2026-01-18T10:45:01.607,62.25
2026-01-18T10:45:01.669,62.25
2026-01-18T10:45:01.729,57.65
2026-01-18T10:45:01.789,52.16
2026-01-18T10:45:01.849,52.16
2026-01-18T10:45:01.910,63.57
2026-01-18T10:45:02.002,60.33
2026-01-18T10:45:02.032,60.33
2026-01-18T10:45:02.092,52.16
2026-01-18T10:45:02.151,52.16
2026-01-18T10:45:02.212,62.25
2026-01-18T10:45:02.273,62.25
2026-01-18T10:45:02.333,60.37
2026-01-18T10:45:02.393,51.3
2026-01-18T10:45:02.453,51.3
2026-01-18T10:45:02.513,61.31
2026-01-18T10:45:02.575,61.31
2026-01-18T10:45:02.638,60.37
2026-01-18T10:45:02.700,52.67
2026-01-18T10:45:02.767,52.67
2026-01-18T10:45:02.836,62.21
2026-01-18T10:45:02.917,61.74
2026-01-18T10:45:02.997,50.79
2026-01-18T10:45:03.128,66.71
2026-01-18T10:45:03.153,66.71
2026-01-18T10:45:03.203,49.84
2026-01-18T10:45:03.267,49.84
2026-01-18T10:45:03.338,66.71
2026-01-18T10:45:03.409,53.05
2026-01-18T10:45:03.481,53.05
2026-01-18T10:45:03.553,50.79
2026-01-18T10:45:03.619,57.14
2026-01-18T10:45:03.690,57.14
2026-01-18T10:45:03.760,49.42
2026-01-18T10:45:03.825,59.44
2026-01-18T10:45:03.889,59.44
2026-01-18T10:45:03.970,49.42
2026-01-18T10:45:04.031,64.42
2026-01-18T10:45:04.102,50.37
2026-01-18T10:45:04.202,50.79
2026-01-18T10:45:04.267,50.79
2026-01-18T10:45:04.330,59.02
2026-01-18T10:45:04.390,59.02
2026-01-18T10:45:04.451,63.06
2026-01-18T10:45:04.512,50.79
2026-01-18T10:45:04.573,50.79
2026-01-18T10:45:04.634,61.31
2026-01-18T10:45:04.695,61.31
2026-01-18T10:45:04.757,60.81
2026-01-18T10:45:04.817,51.72
2026-01-18T10:45:04.878,51.72
2026-01-18T10:45:04.941,65.36
2026-01-18T10:45:05.002,65.36
2026-01-18T10:45:05.063,57.14
2026-01-18T10:45:05.125,53.09
2026-01-18T10:45:05.190,53.09
2026-01-18T10:45:05.254,64.94
2026-01-18T10:45:05.340,52.16
2026-01-18T10:45:05.437,56.28
2026-01-18T10:45:05.503,56.28
2026-01-18T10:45:05.565,60.37
2026-01-18T10:45:05.639,53.98
2026-01-18T10:45:05.703,53.98
2026-01-18T10:45:05.765,62.21
2026-01-18T10:45:05.836,51.72
2026-01-18T10:45:05.897,51.72
2026-01-18T10:45:05.957,67.66
2026-01-18T10:45:06.022,49.84
2026-01-18T10:45:06.083,49.84
2026-01-18T10:45:06.153,57.14
2026-01-18T10:45:06.214,57.14
2026-01-18T10:45:06.274,58.07
2026-01-18T10:45:06.337,53.05
2026-01-18T10:45:06.398,68.08
2026-01-18T10:45:06.524,68.08
2026-01-18T10:45:06.546,56.72
2026-01-18T10:45:06.581,56.72
2026-01-18T10:45:06.643,53.48
2026-01-18T10:45:06.705,53.48
2026-01-18T10:45:06.766,67.66
2026-01-18T10:45:06.827,67.66
2026-01-18T10:45:06.889,50.37
2026-01-18T10:45:06.953,56.28
2026-01-18T10:45:07.015,56.28
2026-01-18T10:45:07.086,53.48
2026-01-18T10:45:07.146,53.48
2026-01-18T10:45:07.207,53.48
2026-01-18T10:45:07.264,68.08
2026-01-18T10:45:07.333,68.08
2026-01-18T10:45:07.390,52.16
2026-01-18T10:45:07.459,54.42
2026-01-18T10:45:07.518,54.42
2026-01-18T10:45:07.579,52.16
2026-01-18T10:45:07.668,52.16
2026-01-18T10:45:07.715,52.16
2026-01-18T10:45:07.776,67.23
2026-01-18T10:45:07.838,67.23
2026-01-18T10:45:07.899,50.37
2026-01-18T10:45:07.960,55.35
2026-01-18T10:45:08.021,55.35
2026-01-18T10:45:08.083,69.01
2026-01-18T10:45:08.145,52.67
2026-01-18T10:45:08.206,52.67
2026-01-18T10:45:08.267,60.81
2026-01-18T10:45:08.328,60.81
2026-01-18T10:45:08.388,67.23
2026-01-18T10:45:08.466,55.35
2026-01-18T10:45:08.527,55.35
2026-01-18T10:45:08.588,67.66
2026-01-18T10:45:08.648,52.67
2026-01-18T10:45:08.710,53.05
2026-01-18T10:45:08.830,53.05
2026-01-18T10:45:08.890,69.53
2026-01-18T10:45:08.958,52.67
2026-01-18T10:45:09.014,52.67
2026-01-18T10:45:09.071,58.59
2026-01-18T10:45:09.137,58.59
2026-01-18T10:45:09.221,52.12
2026-01-18T10:45:09.283,67.23
2026-01-18T10:45:09.348,67.23
2026-01-18T10:45:09.411,54.93
2026-01-18T10:45:09.476,58.07
2026-01-18T10:45:09.549,58.07
2026-01-18T10:45:09.615,52.16
2026-01-18T10:45:09.676,58.07
2026-01-18T10:45:09.736,58.07
2026-01-18T10:45:09.797,51.72
2026-01-18T10:45:09.935,51.72
2026-01-18T10:45:09.957,51.72
2026-01-18T10:45:09.984,58.07
2026-01-18T10:45:10.048,58.07
2026-01-18T10:45:10.110,62.25
2026-01-18T10:45:10.171,51.72
2026-01-18T10:45:10.242,51.72
2026-01-18T10:45:10.302,67.23
2026-01-18T10:45:10.364,50.37
2026-01-18T10:45:10.424,50.37
2026-01-18T10:45:10.486,56.28
2026-01-18T10:45:10.547,56.28
2026-01-18T10:45:10.613,64.94
2026-01-18T10:45:10.679,51.72
2026-01-18T10:45:10.743,51.72
2026-01-18T10:45:10.804,67.66
2026-01-18T10:45:10.878,50.37
2026-01-18T10:45:10.941,50.37
2026-01-18T10:45:11.070,69.01
2026-01-18T10:45:11.130,49.42
2026-01-18T10:45:11.192,55.35
2026-01-18T10:45:11.263,55.35
2026-01-18T10:45:11.325,56.72
2026-01-18T10:45:11.386,53.09
2026-01-18T10:45:11.448,53.09
2026-01-18T10:45:11.510,68.08
2026-01-18T10:45:11.573,68.08
2026-01-18T10:45:11.635,50.37
2026-01-18T10:45:11.722,63.57
2026-01-18T10:45:11.784,51.3
2026-01-18T10:45:11.846,51.3
2026-01-18T10:45:11.908,64.94
2026-01-18T10:45:11.973,64.94
2026-01-18T10:45:12.034,53.98
2026-01-18T10:45:12.097,55.35
2026-01-18T10:45:12.203,67.23
2026-01-18T10:45:12.232,67.23
2026-01-18T10:45:12.284,50.79
2026-01-18T10:45:12.346,50.79
2026-01-18T10:45:12.416,59.44
2026-01-18T10:45:12.437,59.44
2026-01-18T10:45:12.547,49.03
2026-01-18T10:45:12.658,49.03
2026-01-18T10:45:12.768,49.03
2026-01-18T10:45:12.878,49.03
2026-01-18T10:45:12.988,49.03
2026-01-18T10:45:13.099,49.03
2026-01-18T10:45:13.209,50.82
2026-01-18T10:45:13.420,49.03
2026-01-18T10:45:13.532,49.03
2026-01-18T10:45:13.641,48.52
2026-01-18T10:45:13.752,48.52
2026-01-18T10:45:13.861,48.52
2026-01-18T10:45:13.971,48.52
2026-01-18T10:45:14.082,48.52
2026-01-18T10:45:14.193,48.52
2026-01-18T10:45:14.303,49.03
2026-01-18T10:45:14.413,50.4
2026-01-18T10:45:14.543,48.52
2026-01-18T10:45:14.653,48.52
2026-01-18T10:45:14.763,48.52
2026-01-18T10:45:14.874,48.52
2026-01-18T10:45:14.985,48.52
2026-01-18T10:45:15.095,49.03
2026-01-18T10:45:15.205,49.03
2026-01-18T10:45:15.315,48.52
2026-01-18T10:45:15.425,48.52
2026-01-18T10:45:15.534,50.82
2026-01-18T10:45:15.665,48.52
2026-01-18T10:45:15.775,48.52
2026-01-18T10:45:15.885,48.52
2026-01-18T10:45:15.995,48.52
2026-01-18T10:45:16.106,48.52
2026-01-18T10:45:16.217,49.03
2026-01-18T10:45:16.327,48.52
2026-01-18T10:45:16.437,48.52
2026-01-18T10:45:16.546,48.52
2026-01-18T10:45:16.657,50.4
2026-01-18T10:45:16.788,48.52
2026-01-18T10:45:16.898,48.52
2026-01-18T10:45:17.009,48.52
2026-01-18T10:45:17.119,48.52
2026-01-18T10:45:17.229,49.03
2026-01-18T10:45:17.339,48.52
2026-01-18T10:45:17.449,48.52
2026-01-18T10:45:17.558,48.52
2026-01-18T10:45:17.670,48.52
2026-01-18T10:45:17.781,50.37
2026-01-18T10:45:17.908,49.03
2026-01-18T10:45:18.017,48.52total_tokens,267
elapsed_s,28.644992
avg_power_W,55.060354223433244
energy_Wh,0.4381120572909476

8. 分析

从这份数据可以得出以下结论(所有数值均为单次问答):

  1. 时间

    • 总耗时 ≈ 28.6 s
    • 采样点 267 个 → 平均 10.7 次/s,完整覆盖推理前-中-后三个阶段。
  2. Token 效率

    • 共 267 个 token(含输入+输出,流式逐 token 返回)
    • 吞吐 ≈ 267 / 28.6 ≈ 9.3 token/s
    • 每 token 延迟 ≈ 107 ms
  3. 功耗

    • 基线待机 48–50 W
    • 推理峰值 69.5 W,平均 55.1 W,抬升约 15 W
    • 动态范围 21 W(48 W → 69 W)
  4. 能耗

    • 总能量 0.438 Wh(≈ 1.58 kJ)
    • 单 token 能耗 0.438 Wh / 267 ≈ 1.64 mWh
    • 按 0.6 ¥/kWh 估算,电费 ≈ 0.00026 ¥(0.026 分钱)
  5. 能效比

    • 9.3 token/s ÷ 55 W ≈ 0.17 token/J
    • 或 6 J/token,相当于点亮 6 W LED 灯泡 1 秒。
  6. 对比参考

    • 同尺寸纯 GPU 量化模型通常 10–20 token/s,此处 9.3 token/s 略低,可能受网络/API 开销或 CPU 预处理限制。
    • 单 token 1.64 mWh 与文献中 30 B 级量化模型 1–3 mWh 相符,属于正常水平。

一句话总结
“30B-Q8 量化模型在上古 GPU 上回答一句自我介绍,用 28 s 生成 267 token,平均功耗 55 W,总能耗 0.44 Wh,单 token 电费不足三万分之一元,能效约 6 J/token。”

http://www.jsqmd.com/news/262492/

相关文章:

  • 028动态规划之字符串DP——算法备赛 - 实践
  • 研究生写论文必备的3款降AI工具,导师都说自然 - 还在做实验的师兄
  • 手把手教你降论文AI率:从检测到修改的完整操作指南 - 还在做实验的师兄
  • 职业院校智慧校园评价指标体系如何构建?这份指南请收好
  • 论文AI率太高被退回?5招教你快速解决 - 还在做实验的师兄
  • 深聊江南电缆官方销售热线,电缆选购有哪些要点? - 工业品牌热点
  • Invicti Standard v26.1.0 发布 - 企业级 Web 应用与 API 安全
  • DeepSeek写的论文怎么降AI?这6款工具亲测有效 - 还在做实验的师兄
  • 导师严选2026 AI论文软件TOP8:MBA毕业论文写作全解析
  • 题目1112:C语言考试练习题_一元二次方程
  • 049.二维差分
  • 2025年本地市场热门重型回弹仪品牌推荐,智能非金属超声检测仪/超声波回弹仪/数显碳化深度尺/高强回弹仪回弹仪供应商推荐榜单 - 品牌推荐师
  • 融智学形式本体论:一种基于子全域与超子域的统一认知架构
  • 动态电压恢复器(DVR)模型 Matlab/simulink 质量过硬, 可用于治理电能质量问...
  • 2026年国内可靠的全自动超声波清洗机厂家哪家靠谱,单臂超声波清洗机/晶圆清洗机,全自动超声波清洗机公司联系方式 - 品牌推荐师
  • MATLAB环境下基于数据驱动的随机子空间(SSI-DATA)和协方差驱动的随机子空间(SSI...
  • 从零开始:用 Android Studio 开发一个 AI 智能日记 App - 指南
  • Apache 详解(在 Ubuntu 24 中安装和配置 Apache,超详细)
  • 4.4 虚拟人口型驱动:让静态图像开口说话的魔法
  • leetcode 881. Boats to Save People 救生艇
  • 5.2 多模态OCR架构:Donut、TrOCR、LayoutLMv3全面对比
  • [ARC135D] Add to Square
  • 2026年出国留学机构排行榜:五家优选全面对比 - 速递信息
  • 5.1 OCR技术进化史:从传统方法到生成式AI突破
  • SAM1gptans
  • 通过mathtype将公式插入word中
  • 2026智能马桶深度评测:希箭马桶,家庭如厕健康新标准 - charlieruizvin
  • 瞧瞧别人家的接口重试,那叫一个优雅!
  • 论文查重前必备的5款AIGC检测工具盘点 - 还在做实验的师兄
  • 完整教程:算法王冠上的明珠——动态规划之路径问题(第一篇)