当前位置：首页 > news >正文

【AI】本地模型部署

news 2026/4/30 20:03:14

一、本地模型的安装

下载ollama: https://ollama.com/download
拉取模型

ollama pull deepseek-coder:6.7b-instruct-q4_K_M

运行模型

# 启动模型交互ollama run deepseek-coder:6.7b-instruct-q4_K_M

开始与大模型对话

>>>用Python写一个函数，检查字符串是否是回文（忽略大小写和空格）>>>解释这段代码的时间复杂度

显示AI的帮助信息

>>>/? Available Commands: /set Set session variables /show Show model information /load<model>Load a session or model /save<model>Save your current session /clear Clear session context /bye Exit /?, /help Helpforacommand/? shortcuts Helpforkeyboard shortcuts Use""" to begin a multi-line message.>>>/bye

二、vs_code+cline+本地模型

启动allama服务，可能会报下列错误，检查是不是打开ollama.exe或者客户端；

PS C:\Users\Admin>ollama serve Error: listen tcp127.0.0.1:11434: bind: Only one usage of each socket address(protocol/network address/port)is normally permitted. PS C:\Users\Admin>netstat-ano|findstr :11434 TCP127.0.0.1:114340.0.0.0:0 LISTENING23032PS C:\Users\Admin>tasklist /FI"PID eq 23032"映像名称 PID 会话名 会话# 内存使用========================================================================ollama.exe23032Console159,180K PS C:\Users\Admin>taskkill /PID23032/F 成功: 已终止 PID 为23032的进程。 PS C:\Users\Admin>ollama serve Error: listen tcp127.0.0.1:11434: bind: Only one usage of each socket address(protocol/network address/port)is normally permitted. PS C:\Users\Admin>taskkill /F /IM ollama.exe /T 成功: 已终止 PID12304(属于 PID22616子进程)的进程。 PS C:\Users\Admin>ollama servetime=2025-07-14T22:25:55.157+08:00level=INFOsource=routes.go:1235msg="server config"env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Admin\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"time=2025-07-14T22:25:55.171+08:00level=INFOsource=images.go:476msg="total blobs: 5"time=2025-07-14T22:25:55.173+08:00level=INFOsource=images.go:483msg="total unused blobs removed: 0"time=2025-07-14T22:25:55.174+08:00level=INFOsource=routes.go:1288msg="Listening on 127.0.0.1:11434 (version 0.9.6)"time=2025-07-14T22:25:55.174+08:00level=INFOsource=gpu.go:217msg="looking for compatible GPUs"time=2025-07-14T22:25:55.174+08:00level=INFOsource=gpu_windows.go:167msg=packagescount=1time=2025-07-14T22:25:55.174+08:00level=INFOsource=gpu_windows.go:183msg="efficiency cores detected"maxEfficiencyClass=1time=2025-07-14T22:25:55.174+08:00level=INFOsource=gpu_windows.go:214msg=""package=0cores=10efficiency=4threads=16time=2025-07-14T22:25:55.306+08:00level=INFOsource=types.go:130msg="inference compute"id=GPU-c34967ed-d57d-9254-842e-328176c8fff7library=cudavariant=v12compute=8.9driver=12.8name="NVIDIA GeForce RTX 4060"total="8.0 GiB"available="6.9 GiB"

点击右下角这个图标打开AI特性，然后点击配置图标，设置cline参数；
cline报错如下，同时客户端中也有报错信息：

llama_context: CPU output buffer size=0.14MiB llama_kv_cache_unified: kv_size=16384, type_k='f16', type_v='f16', n_layer=32, can_shift=1, padding=32llama_kv_cache_unified: CUDA0 KV buffer size=3072.00MiB ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size5368709120alloc_tensor_range: failed to allocate CPU buffer of size5368709120llama_init_from_model: failed to initialize the context: failed to allocate bufferforkv cache panic: unable to create llama context

原因是cuda没安装，在这里下载：https://developer.nvidia.com/cuda-toolkit-archive
cuda安装后问题仍然存在，原因是上下文限制，需要配置ollama运行时的上下文长度；
cline使用deepseek API和本地模型时表现不一致，使用deepseek API时可以自动读取当前处于active状态的文件，使用本地模型时则无法感知到本地文件，似乎需要额外配置。可能的原因如下：

查看全文

http://www.jsqmd.com/news/727250/