当前位置：首页 > news >正文

找到一种方法：用LM Studio 和 llmster 可以把qwen3.5改成nothinking版本装载来提高响应速度

news 2026/7/26 5:56:45

废话不多说直接拿Qwen3.5-9B-Q4_K_M.gguf模型举例，先用get命令下载这个模型，可以正常使用后。
找到模型目录，如：用户目录~/.lmstudio/models/lmstudio-community/Qwen3.5-9B-GGUF

[root@localhost ~]# cd .lmstudio/models/lmstudio-community
[root@localhost lmstudio-community]# ls -al
总用量 0
drwxr-xr-x. 5 root root 90 3月 25 17:12 .
drwxr-xr-x. 3 root root 32 3月 19 16:39 ..
drwxr-xr-x. 2 root root 71 3月 25 16:34 Qwen3.5-9B-GGUF
然后新建一个同样的目录，带上后缀如：
[root@localhost lmstudio-community]# mkdir Qwen3.5-9B-GGUF-no-thinking
[root@localhost lmstudio-community]# ls -al
总用量 0
drwxr-xr-x. 5 root root 90 3月 25 17:12 .
drwxr-xr-x. 3 root root 32 3月 19 16:39 ..
drwxr-xr-x. 2 root root 71 3月 25 16:34 Qwen3.5-9B-GGUF
drwxr-xr-x. 2 root root 24 3月 25 17:43 Qwen3.5-9B-GGUF-no-thinking
进入这个新建录目并建立一个model.yaml文件
[root@localhost lmstudio-community]# cd Qwen3.5-9B-GGUF-no-thinking/
[root@localhost Qwen3.5-9B-GGUF-no-thinking]# vim model.yaml
# 将如下内容存进去。注意缩进格式要一样，每层都是靠两个空格
model: lmstudio-community/Qwen3.5-9B-GGUF-no-thinking
base: lmstudio-community/Qwen3.5-9B-GGUF/Qwen3.5-9B-Q4_K_M.gguf
metadataOverrides:
reasoning: false
customFields:
- key: enableThinking
displayName: "Enable Thinking"
description: "Whether to allow thinking output before the final answer"
type: boolean
defaultValue: false
effects:
- type: setJinjaVariable
variable: enable_thinking
完了后，你的模型列表就会多一个模型出来，执行命令lms ls
这时候通过命令行lms load 还可能装载不进去(llmster此处还有bug)。要通过界面进行装载。
回到windows的 LM Studio界面上（因为已经通过LM Link互联上了），按CTRL + L，弹出窗口中应该已经有了这个模型，如果没有会有错误提示，你再修改model.yaml文件。
打开下面的手工调整模型参数开关，点击选中模型，显示参数窗口，托动条调整上下文长度16k左右（不要一下子调到200k，要一点一点的向大里试），和卸载到内存层数32，然后装载模型。

装载成功后，回到命令行试一下：
lms chat
/model
选这个no-thinking模型，聊几句看看正常否。
/exit退出

启动接口服务：
lms server start --help 查一下帮助
lms server start --bind 0.0.0.0 --port 1234 --cors(允许跨域)
在防火墙上开端口
firewall-cmd --add-port 1234/tcp

这时候就可以在你项目里配上本地地址了：不管是openclaw还是openwebui，以及anythingllm，n8，同时也支持clade code，url和open ai的不一样多个messages：http://192.168.0.121:1234/v1/messages，可以等等。
open ai格式url :http://192.168.0.121:1234/v1
key:lmstudio(随便输一个，不能空）
模型：Qwen3.5-9B-GGUF-no-thinking

同样的方法也适用别的带深度思考的模型，只需改改model.yaml文件的前两行就行了。

查看全文

http://www.jsqmd.com/news/588268/