当前位置: 首页 > news >正文

用llmfit来估算机器能运行的大模型

工具官方网站:https://www.llmfit.org/zh-cn
方法1. 下载docker镜像

C:\d>wsl root@DESKTOP-59T6U68:/mnt/c/d# docker pull ghcr.io/alexsjones/llmfit Trying to pull ghcr.io/alexsjones/llmfit:latest... Getting image source signatures Copying blob ff86ea2e5edc skipped: already exists Copying blob ae62bed2e6dd done Copying blob 1ee47fd61fcb done Copying blob f9cfedbd3651 done Copying config 1b8032be6f done Writing manifest to image destination Storing signatures 1b8032be6f4f332fc871d3391dd2a112a60516ea2781bfde09822c522bf1d43d root@DESKTOP-59T6U68:/mnt/c/d# docker run -itd -v /mnt/c/d:/par --network host --name llmfit ghcr.io/alexsjones/llmfit 00e39be8b38c2411788c066cdd212fbdfe5f39eb344f849f0ee930abee4ba6f6 root@DESKTOP-59T6U68:/mnt/c/d# docker exec -it llmfit Error: must provide a non-empty command to start an exec session: invalid argument root@DESKTOP-59T6U68:/mnt/c/d# docker exec -it llmfit bash Error: can only create exec sessions on running containers: container state improper

这个容器无法登录,查看日志,显示了json格式的结果

root@DESKTOP-59T6U68:/mnt/c/d# docker logs llmfit { "models": [ { "best_quant": "Q4_K_M", "capabilities": [ "Tool Use" ], "capability_ids": [ "tool_use" ], "category": "Coding", "context_length": 262144, "disk_size_gb": 6.86, "effective_context_length": 8192, "estimated_tps": 32.8, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 6.6, "moe_offloaded_gb": null, "name": "Intel/Qwen3-Coder-Next-int4-AutoRound", "notes": [ "Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)", "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Baseline estimated speed: 32.8 tok/s" ], "parameter_count": "11.8B", "params_b": 11.82, "provider": "intel", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 88.9, "score_components": { "context": 100.0, "fit": 100.0, "quality": 85.0, "speed": 81.9 }, "total_memory_gb": 6.6, "use_case": "Code generation and completion", "utilization_pct": 72.9 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 8192, "disk_size_gb": 3.5, "effective_context_length": 8192, "estimated_tps": 39.8, "fit_level": "Marginal", "gguf_sources": [ { "provider": "ggml-org", "repo": "ggml-org/DeepSeek-OCR-GGUF" } ], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 1.9, "moe_offloaded_gb": null, "name": "deepseek-ai/DeepSeek-OCR", "notes": [ "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 39.8 tok/s" ], "parameter_count": "3.3B", "params_b": 3.34, "provider": "DeepSeek", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 79.7, "score_components": { "context": 100.0, "fit": 76.8, "quality": 63.0, "speed": 99.6 }, "total_memory_gb": 1.9, "use_case": "General purpose", "utilization_pct": 21.0 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 8192, "disk_size_gb": 3.56, "effective_context_length": 8192, "estimated_tps": 39.2, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 1.9, "moe_offloaded_gb": null, "name": "deepseek-ai/DeepSeek-OCR-2", "notes": [ "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 39.2 tok/s" ], "parameter_count": "3.4B", "params_b": 3.39, "provider": "DeepSeek", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 79.3, "score_components": { "context": 100.0, "fit": 76.8, "quality": 63.0, "speed": 98.0 }, "total_memory_gb": 1.9, "use_case": "General purpose", "utilization_pct": 21.0 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 262144, "disk_size_gb": 3.51, "effective_context_length": 8192, "estimated_tps": 50.6, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 1.9, "moe_offloaded_gb": null, "name": "dealignai/Gemma-4-26B-A4B-JANG_2L-CRACK", "notes": [ "Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)", "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 50.6 tok/s" ], "parameter_count": "3.3B", "params_b": 3.34, "provider": "dealignai", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 79.0, "score_components": { "context": 100.0, "fit": 76.8, "quality": 61.0, "speed": 100.0 }, "total_memory_gb": 1.9, "use_case": "General purpose", "utilization_pct": 21.0 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 262144, "disk_size_gb": 4.96, "effective_context_length": 8192, "estimated_tps": 35.8, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 2.6, "moe_offloaded_gb": null, "name": "dealignai/Gemma-4-26B-A4B-JANG_4M-CRACK", "notes": [ "Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)", "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 35.8 tok/s" ], "parameter_count": "4.7B", "params_b": 4.72, "provider": "dealignai", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 76.7, "score_components": { "context": 100.0, "fit": 83.0, "quality": 61.0, "speed": 89.4 }, "total_memory_gb": 2.6, "use_case": "General purpose", "utilization_pct": 28.7 } ], "system": { "available_ram_gb": 9.05, "backend": "CPU (x86)", "cpu_cores": 16, "cpu_name": "AMD Ryzen 7 8845H w/ Radeon 780M Graphics", "gpu_count": 0, "gpu_name": null, "gpu_vram_gb": null, "gpus": [], "has_gpu": false, "total_ram_gb": 9.72, "unified_memory": false } } root@DESKTOP-59T6U68:/mnt/c/d#

方法2.下载二进制文件

C:\d>wget https://kkgithub.com/AlexsJones/llmfit/releases/download/v0.9.18/llmfit-v0.9.18-x86_64-pc-windows-msvc.zip llmfit-v0.9.18-x86_64-pc-wind 100%[=================================================>] 2.93M 5.66MB/s in 0.5s 2026-05-03 10:51:31 (5.66 MB/s) - 'llmfit-v0.9.18-x86_64-pc-windows-msvc.zip' saved [3068451/3068451] C:\d>wget https://kkgithub.com/AlexsJones/llmfit/releases/download/v0.9.18/llmfit-v0.9.18-x86_64-unknown-linux-gnu.tar.gz llmfit-v0.9.18-x86_64-unknown 100%[=================================================>] 3.58M 6.69MB/s in 0.5s 2026-05-03 10:51:51 (6.69 MB/s) - 'llmfit-v0.9.18-x86_64-unknown-linux-gnu.tar.gz' saved [3758629/3758629] C:\d>wget https://kkgithub.com/AlexsJones/llmfit/releases/download/v0.9.18/llmfit-v0.9.18-x86_64-unknown-linux-musl.tar.gz llmfit-v0.9.18-x86_64-unknown 100%[=================================================>] 3.68M 7.03MB/s in 0.5s 2026-05-03 10:52:07 (7.03 MB/s) - 'llmfit-v0.9.18-x86_64-unknown-linux-musl.tar.gz' saved [3863166/3863166]

不带参数运行,显示一个TUI(文字界面),可以上下键浏览

C:\d>llmfit

带参数浏览,列出最适合的前10个

C:\d>llmfit fit --perfect -n 10 === System Specifications === CPU: AMD Ryzen 7 8845H w/ Radeon 780M Graphics (16 cores) Total RAM: 12.80 GB Available RAM: 5.46 GB Backend: Vulkan GPU: AMD Radeon 780M Graphics (3.00 GB VRAM, Vulkan) (197 models hidden — incompatible backend) === Model Compatibility Analysis === Found 10 compatible model(s) ╭────────────┬────────────────────────────────────────────┬─────────────┬──────┬───────┬────────────┬───────┬───────────┬──────┬───────┬─────────┬─────────────╮ │ Status │ Model │ Provider │ Size │ Score │ tok/s est. │ Quant │ Runtime │ Mode │ Mem % │ Context │ Added to HF │ ├────────────┼────────────────────────────────────────────┼─────────────┼──────┼───────┼────────────┼───────┼───────────┼──────┼───────┼─────────┼─────────────┤ │ 🟢 Perfect │ meta-llama/Llama-3.2-1B-Instruct │ Meta │ 1.2B │ 79 │ 106.8 │ Q8_0 │ llama.cpp │ GPU │ 61.3% │ 4k │ — │ │ 🟢 Perfect │ cazzz307/Abliterated-Llama-3.2-1B-Instruct │ cazzz307 │ 1.2B │ 79 │ 106.8 │ Q8_0 │ llama.cpp │ GPU │ 61.3% │ 4k │ — │ │ 🟢 Perfect │ Vikhrmodels/Vikhr-Llama-3.2-1B-Instruct │ vikhrmodels │ 1.2B │ 79 │ 106.8 │ Q8_0 │ llama.cpp │ GPU │ 62.6% │ 4194k │ — │ │ 🟢 Perfect │ RedHatAI/Llama-3.2-1B-Instruct-FP8 │ redhatai │ 1.5B │ 79 │ 88.1 │ Q8_0 │ llama.cpp │ GPU │ 72.4% │ 4194k │ — │ │ 🟢 Perfect │ RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic │ redhatai │ 1.5B │ 79 │ 88.1 │ Q8_0 │ llama.cpp │ GPU │ 72.4% │ 4194k │ — │ │ 🟢 Perfect │ Qwen/Qwen2.5-1.5B-Instruct │ Alibaba │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 74.1% │ 32k │ — │ │ 🟢 Perfect │ Qwen/Qwen2-1.5B-Instruct │ Alibaba │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 74.1% │ 32k │ — │ │ 🟢 Perfect │ Qwen/Qwen2.5-Math-1.5B-Instruct │ Alibaba │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 72.4% │ 4k │ — │ │ 🟢 Perfect │ RedHatAI/Qwen2-1.5B-Instruct-FP8 │ redhatai │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 74.1% │ 32k │ — │ │ 🟢 Perfect │ LiquidAI/LFM2.5-1.2B-Instruct │ Liquid AI │ 1.2B │ 78 │ 112.8 │ Q8_0 │ llama.cpp │ GPU │ 66.0% │ 128k │ 2026-01-06 │ ╰────────────┴────────────────────────────────────────────┴─────────────┴──────┴───────┴────────────┴───────┴───────────┴──────┴───────┴─────────┴─────────────╯ Note: tok/s values are baseline estimates; real runtime depends on engine/runtime.
http://www.jsqmd.com/news/745225/

相关文章:

  • 为现实世界中的智能体配备技能 Equipping agents for the real world with Agent Skills —— Anthropic
  • 飞书远程控机神器:OpenClaw配置全攻略
  • 开源AI浏览器自动化工具Open ChatGPT Atlas部署与实战指南
  • 2025最权威的降AI率方案实测分析
  • GPT-SoVITS MPS加速终极指南:macOS语音合成性能提升300%
  • RPG Maker终极解密工具:三步轻松提取游戏资源完整指南
  • 5分钟掌握GPT-SoVITS:用1分钟语音克隆专业级音色的实战指南
  • AI写专著高效之道:合适工具助力,3天产出20万字专著!
  • 解锁网盘下载新姿势:如何一键获取八大网盘真实直链地址
  • [具身智能-551]:智能体即操作系统:AI 时代的新型系统架构范式:智能体本质上不是“应用”,而是一类新型“操作系统”。
  • Lobe Chat开源AI对话平台:私有化部署与架构解析
  • 别再手动写JSON了!用LayUI Cascader插件5分钟搞定省市区三级联动选择器
  • 3.1 ROS2服务案例实践:人脸检测服务
  • 3个真实场景告诉你:为什么Windows电脑也需要安卓应用安装器?
  • 3分钟搞定Windows APK安装:APK-Installer轻量级安卓应用安装器终极指南
  • 告别手动一个个改!用Allegro的Change命令批量修改PCB丝印字体全攻略
  • UE5 GAS实战避坑:从“标签”到“触发”,那些官方文档没细说的配置细节(5.2.1版本)
  • hcaptcha-challenger:基于MLLM与视觉模型的验证码AI对抗实战
  • 逆向实战:手把手教你用C++复现TikTok的X-Gorgon签名算法(附完整源码)
  • Java开发者集成ChatGPT:chatgpt-java SDK实战指南
  • 手把手教你用Python3.8和PyTorch复现D-LinkNet:搞定卫星遥感道路分割(附DeepGlobe数据集下载)
  • C++高性能期权量化库OptionSuite:从定价模型到策略回测的工程实践
  • 从“驴拉磨”到“磁悬浮”:用生活化比喻拆解FOC(磁场定向控制)到底在干啥
  • 3分钟掌握跨设备传输:Chrome-QRCode智能二维码工具实战
  • 等保四级强制生效倒计时!Java医疗系统合规改造只剩最后90天——这份含国密SM4/SM2迁移脚本的速通方案请立刻保存
  • AI驱动浏览器自动化:Skyvern如何用视觉理解革新网页操作
  • 2026届必备的降重复率平台实际效果
  • 新手入门CTF逆向:用IDA Pro破解BUUCTF前10题(附详细脚本)
  • Godot引擎视觉化脚本工具Hengo:从原理到实战的完整指南
  • 分块 and 莫队 学习笔记