当前位置：首页 > news >正文

llama.cpp增加模型目录的检查深度（匹配LM Studio的模型目录）

news 2026/6/24 9:12:03

最近使用llama.cpp时候(b9038)，发现Qwen3.6 35B下输出速度比Ollama快出一倍（llama.cpp 79 t/s VS ollama 44t/s），但是之前一直是直接指定模型载入。

近期和部分网友交流时发现了llama.cpp实际已经支持了模型路由（多模型切换），通过 --models-dir 参数就能实现多模型载入。

之前一直都是用LM Studio做模型下载器和初步尝试，这里试了下，实际模型只能加载量级目录下的模型，再深一级，就没法识别。

载入模型命令如下：

./llama-server --models-dir ~/.lmstudio/models/lmstudio-community/ --ctx-size 32768 --n-gpu-layers 9999 --main-gpu 0 --split-mode none --host 0.0.0.0 --port 8080

这样就会加载 ~/.lmstudio/models/lmstudio-community/ 目录下的多个模型。

现在遇到的问题就是LM Studio会将不同的模型分类到不同的来源目录下，比如除了上述的目录，我这边还有HauhauCS的目录。

如果直接从~/.lmstudio/models/目录传入路径，就没法正常载入模型，只能选择其目录下的一个子目录。

实际上，通过创建HauhauCS的快捷到lmstudio-commuity中，也能解决模型的加载，但是这样一来，LM Studio中就会出现重复的模型，对于强迫症的我比较难接受。

为了彻底解决这个问题，那就得动llama.cpp的源码了。

通过修改common/preset.cpp下的load_from_models_dir方法，就能达到目的。

修改代码：

common_presets common_preset_context::load_from_models_dir(const std::string & models_dir) const {
if (!std::filesystem::exists(models_dir) || !std::filesystem::is_directory(models_dir)) {
throw std::runtime_error(string_format("error: '%s' does not exist or is not a directory\n", models_dir.c_str()));
}

std::vector<local_model> models;
auto scan_subdir = [&models](const std::string & subdir_path, const std::string & name) {
auto files = fs_list(subdir_path, false);
common_file_info model_file;
common_file_info first_shard_file;
common_file_info mmproj_file;
for (const auto & file : files) {
if (string_ends_with(file.name, ".gguf")) {
if (file.name.find("mmproj") != std::string::npos) {
mmproj_file = file;
} else if (file.name.find("-00001-of-") != std::string::npos) {
first_shard_file = file;
} else {
model_file = file;
}
}
}
// single file model
local_model model{
/* name */ name,
/* path */ first_shard_file.path.empty() ? model_file.path : first_shard_file.path,
/* path_mmproj */ mmproj_file.path // can be empty
};
if (!model.path.empty()) {
models.push_back(model);
}
};

auto files = fs_list(models_dir, true);
bool has_subdir = false;
for (const auto & file : files) {
if (file.is_dir) {
has_subdir |= true;
scan_subdir(file.path, file.name);
auto infiles = fs_list(file.path, true);
for (const auto & infile : infiles) {
if (infile.is_dir) {
scan_subdir(infile.path, infile.name);
}
}
} else {
has_subdir |= false;
}
}
if(!has_subdir) {
// if there is no subdir, treat the main dir as the model dir
scan_subdir(models_dir, std::filesystem::path(models_dir).filename().string());
}

// convert local models to presets
common_presets out;
for (const auto & model : models) {
common_preset preset;
preset.name = model.name;
preset.set_option(*this, "LLAMA_ARG_MODEL", model.path);
if (!model.path_mmproj.empty()) {
preset.set_option(*this, "LLAMA_ARG_MMPROJ", model.path_mmproj);
}
out[preset.name] = preset;
}

return out;
}

修改后，重新编译，并再次尝试载入：

./llama-server --models-dir ~/.lmstudio/models/ --ctx-size 32768 --n-gpu-layers 9999 --main-gpu 0 --split-mode none --host 0.0.0.0 --port 8080

这样models目录下的所有模型就都能载入了。