当前位置：首页 > news >正文

用llmfit来估算机器能运行的大模型

news 2026/6/18 7:27:32

工具官方网站：https://www.llmfit.org/zh-cn
方法1. 下载docker镜像

C:\d>wsl root@DESKTOP-59T6U68:/mnt/c/d# docker pull ghcr.io/alexsjones/llmfit Trying to pull ghcr.io/alexsjones/llmfit:latest... Getting image source signatures Copying blob ff86ea2e5edc skipped: already exists Copying blob ae62bed2e6dd done Copying blob 1ee47fd61fcb done Copying blob f9cfedbd3651 done Copying config 1b8032be6f done Writing manifest to image destination Storing signatures 1b8032be6f4f332fc871d3391dd2a112a60516ea2781bfde09822c522bf1d43d root@DESKTOP-59T6U68:/mnt/c/d# docker run -itd -v /mnt/c/d:/par --network host --name llmfit ghcr.io/alexsjones/llmfit 00e39be8b38c2411788c066cdd212fbdfe5f39eb344f849f0ee930abee4ba6f6 root@DESKTOP-59T6U68:/mnt/c/d# docker exec -it llmfit Error: must provide a non-empty command to start an exec session: invalid argument root@DESKTOP-59T6U68:/mnt/c/d# docker exec -it llmfit bash Error: can only create exec sessions on running containers: container state improper

这个容器无法登录，查看日志，显示了json格式的结果

root@DESKTOP-59T6U68:/mnt/c/d# docker logs llmfit { "models": [ { "best_quant": "Q4_K_M", "capabilities": [ "Tool Use" ], "capability_ids": [ "tool_use" ], "category": "Coding", "context_length": 262144, "disk_size_gb": 6.86, "effective_context_length": 8192, "estimated_tps": 32.8, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 6.6, "moe_offloaded_gb": null, "name": "Intel/Qwen3-Coder-Next-int4-AutoRound", "notes": [ "Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)", "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Baseline estimated speed: 32.8 tok/s" ], "parameter_count": "11.8B", "params_b": 11.82, "provider": "intel", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 88.9, "score_components": { "context": 100.0, "fit": 100.0, "quality": 85.0, "speed": 81.9 }, "total_memory_gb": 6.6, "use_case": "Code generation and completion", "utilization_pct": 72.9 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 8192, "disk_size_gb": 3.5, "effective_context_length": 8192, "estimated_tps": 39.8, "fit_level": "Marginal", "gguf_sources": [ { "provider": "ggml-org", "repo": "ggml-org/DeepSeek-OCR-GGUF" } ], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 1.9, "moe_offloaded_gb": null, "name": "deepseek-ai/DeepSeek-OCR", "notes": [ "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 39.8 tok/s" ], "parameter_count": "3.3B", "params_b": 3.34, "provider": "DeepSeek", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 79.7, "score_components": { "context": 100.0, "fit": 76.8, "quality": 63.0, "speed": 99.6 }, "total_memory_gb": 1.9, "use_case": "General purpose", "utilization_pct": 21.0 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 8192, "disk_size_gb": 3.56, "effective_context_length": 8192, "estimated_tps": 39.2, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 1.9, "moe_offloaded_gb": null, "name": "deepseek-ai/DeepSeek-OCR-2", "notes": [ "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 39.2 tok/s" ], "parameter_count": "3.4B", "params_b": 3.39, "provider": "DeepSeek", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 79.3, "score_components": { "context": 100.0, "fit": 76.8, "quality": 63.0, "speed": 98.0 }, "total_memory_gb": 1.9, "use_case": "General purpose", "utilization_pct": 21.0 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 262144, "disk_size_gb": 3.51, "effective_context_length": 8192, "estimated_tps": 50.6, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 1.9, "moe_offloaded_gb": null, "name": "dealignai/Gemma-4-26B-A4B-JANG_2L-CRACK", "notes": [ "Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)", "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 50.6 tok/s" ], "parameter_count": "3.3B", "params_b": 3.34, "provider": "dealignai", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 79.0, "score_components": { "context": 100.0, "fit": 76.8, "quality": 61.0, "speed": 100.0 }, "total_memory_gb": 1.9, "use_case": "General purpose", "utilization_pct": 21.0 }, { "best_quant": "Q8_0", "capabilities": [], "capability_ids": [], "category": "General", "context_length": 262144, "disk_size_gb": 4.96, "effective_context_length": 8192, "estimated_tps": 35.8, "fit_level": "Marginal", "gguf_sources": [], "installed": false, "is_moe": true, "license": null, "memory_available_gb": 9.05, "memory_required_gb": 2.6, "moe_offloaded_gb": null, "name": "dealignai/Gemma-4-26B-A4B-JANG_4M-CRACK", "notes": [ "Context capped at 8192 tokens for estimation (model supports up to 262144; use --max-context to override)", "CPU-only: model loaded into system RAM", "MoE architecture, but expert offloading requires a GPU", "No GPU -- inference will be slow", "Best quantization for hardware: Q8_0 (model default: Q4_K_M)", "Baseline estimated speed: 35.8 tok/s" ], "parameter_count": "4.7B", "params_b": 4.72, "provider": "dealignai", "release_date": null, "run_mode": "CPU", "runtime": "llama.cpp", "runtime_label": "llama.cpp", "score": 76.7, "score_components": { "context": 100.0, "fit": 83.0, "quality": 61.0, "speed": 89.4 }, "total_memory_gb": 2.6, "use_case": "General purpose", "utilization_pct": 28.7 } ], "system": { "available_ram_gb": 9.05, "backend": "CPU (x86)", "cpu_cores": 16, "cpu_name": "AMD Ryzen 7 8845H w/ Radeon 780M Graphics", "gpu_count": 0, "gpu_name": null, "gpu_vram_gb": null, "gpus": [], "has_gpu": false, "total_ram_gb": 9.72, "unified_memory": false } } root@DESKTOP-59T6U68:/mnt/c/d#

方法2.下载二进制文件

C:\d>wget https://kkgithub.com/AlexsJones/llmfit/releases/download/v0.9.18/llmfit-v0.9.18-x86_64-pc-windows-msvc.zip llmfit-v0.9.18-x86_64-pc-wind 100%[=================================================>] 2.93M 5.66MB/s in 0.5s 2026-05-03 10:51:31 (5.66 MB/s) - 'llmfit-v0.9.18-x86_64-pc-windows-msvc.zip' saved [3068451/3068451] C:\d>wget https://kkgithub.com/AlexsJones/llmfit/releases/download/v0.9.18/llmfit-v0.9.18-x86_64-unknown-linux-gnu.tar.gz llmfit-v0.9.18-x86_64-unknown 100%[=================================================>] 3.58M 6.69MB/s in 0.5s 2026-05-03 10:51:51 (6.69 MB/s) - 'llmfit-v0.9.18-x86_64-unknown-linux-gnu.tar.gz' saved [3758629/3758629] C:\d>wget https://kkgithub.com/AlexsJones/llmfit/releases/download/v0.9.18/llmfit-v0.9.18-x86_64-unknown-linux-musl.tar.gz llmfit-v0.9.18-x86_64-unknown 100%[=================================================>] 3.68M 7.03MB/s in 0.5s 2026-05-03 10:52:07 (7.03 MB/s) - 'llmfit-v0.9.18-x86_64-unknown-linux-musl.tar.gz' saved [3863166/3863166]

不带参数运行，显示一个TUI（文字界面），可以上下键浏览

C:\d>llmfit

带参数浏览，列出最适合的前10个

C:\d>llmfit fit --perfect -n 10 === System Specifications === CPU: AMD Ryzen 7 8845H w/ Radeon 780M Graphics (16 cores) Total RAM: 12.80 GB Available RAM: 5.46 GB Backend: Vulkan GPU: AMD Radeon 780M Graphics (3.00 GB VRAM, Vulkan) (197 models hidden — incompatible backend) === Model Compatibility Analysis === Found 10 compatible model(s) ╭────────────┬────────────────────────────────────────────┬─────────────┬──────┬───────┬────────────┬───────┬───────────┬──────┬───────┬─────────┬─────────────╮ │ Status │ Model │ Provider │ Size │ Score │ tok/s est. │ Quant │ Runtime │ Mode │ Mem % │ Context │ Added to HF │ ├────────────┼────────────────────────────────────────────┼─────────────┼──────┼───────┼────────────┼───────┼───────────┼──────┼───────┼─────────┼─────────────┤ │ 🟢 Perfect │ meta-llama/Llama-3.2-1B-Instruct │ Meta │ 1.2B │ 79 │ 106.8 │ Q8_0 │ llama.cpp │ GPU │ 61.3% │ 4k │ — │ │ 🟢 Perfect │ cazzz307/Abliterated-Llama-3.2-1B-Instruct │ cazzz307 │ 1.2B │ 79 │ 106.8 │ Q8_0 │ llama.cpp │ GPU │ 61.3% │ 4k │ — │ │ 🟢 Perfect │ Vikhrmodels/Vikhr-Llama-3.2-1B-Instruct │ vikhrmodels │ 1.2B │ 79 │ 106.8 │ Q8_0 │ llama.cpp │ GPU │ 62.6% │ 4194k │ — │ │ 🟢 Perfect │ RedHatAI/Llama-3.2-1B-Instruct-FP8 │ redhatai │ 1.5B │ 79 │ 88.1 │ Q8_0 │ llama.cpp │ GPU │ 72.4% │ 4194k │ — │ │ 🟢 Perfect │ RedHatAI/Llama-3.2-1B-Instruct-FP8-dynamic │ redhatai │ 1.5B │ 79 │ 88.1 │ Q8_0 │ llama.cpp │ GPU │ 72.4% │ 4194k │ — │ │ 🟢 Perfect │ Qwen/Qwen2.5-1.5B-Instruct │ Alibaba │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 74.1% │ 32k │ — │ │ 🟢 Perfect │ Qwen/Qwen2-1.5B-Instruct │ Alibaba │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 74.1% │ 32k │ — │ │ 🟢 Perfect │ Qwen/Qwen2.5-Math-1.5B-Instruct │ Alibaba │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 72.4% │ 4k │ — │ │ 🟢 Perfect │ RedHatAI/Qwen2-1.5B-Instruct-FP8 │ redhatai │ 1.5B │ 79 │ 85.5 │ Q8_0 │ llama.cpp │ GPU │ 74.1% │ 32k │ — │ │ 🟢 Perfect │ LiquidAI/LFM2.5-1.2B-Instruct │ Liquid AI │ 1.2B │ 78 │ 112.8 │ Q8_0 │ llama.cpp │ GPU │ 66.0% │ 128k │ 2026-01-06 │ ╰────────────┴────────────────────────────────────────────┴─────────────┴──────┴───────┴────────────┴───────┴───────────┴──────┴───────┴─────────┴─────────────╯ Note: tok/s values are baseline estimates; real runtime depends on engine/runtime.

查看全文

http://www.jsqmd.com/news/745225/