一、本地模型的安装 下载ollama: https://ollama.com/download 拉取模型 ollama pull deepseek-coder:6.7b-instruct-q4_K_M运行模型 # 启动模型交互 ollama run deepseek-coder:6.7b-instruct-q4_K_M开始与大模型对话 >> > 用Python写一个函数,检查字符串是否是回文(忽略大小写和空格)>> > 解释这段代码的时间复杂度显示AI的帮助信息 >> > /? Available Commands: /set Set session variables /show Show model information /load< model> Load a session or model /save< model> Save your current session /clear Clear session context /bye Exit /?, /help Helpfor acommand /? shortcuts Helpfor keyboard shortcuts Use"" " to begin a multi-line message.>> > /bye二、vs_code+cline+本地模型 启动allama服务,可能会报下列错误,检查是不是打开ollama.exe或者客户端; PS C:\ Users\ Admin> ollama serve Error: listen tcp127.0 .0.1:11434: bind: Only one usage of each socket address( protocol/network address/port) is normally permitted. PS C:\ Users\ Admin> netstat -ano | findstr :11434 TCP127.0 .0.1:114340.0 .0.0:0 LISTENING23032 PS C:\ Users\ Admin> tasklist /FI"PID eq 23032" 映像名称 PID 会话名 会话# 内存使用 == == == == == == == == == == == == = == == == == == == == == == == == == == == == == == = == == == == == == ollama.exe23032 Console1 59,180 K PS C:\ Users\ Admin> taskkill /PID23032 /F 成功: 已终止 PID 为23032 的进程。 PS C:\ Users\ Admin> ollama serve Error: listen tcp127.0 .0.1:11434: bind: Only one usage of each socket address( protocol/network address/port) is normally permitted. PS C:\ Users\ Admin> taskkill /F /IM ollama.exe /T 成功: 已终止 PID12304 ( 属于 PID22616 子进程) 的进程。 PS C:\ Users\ Admin> ollama servetime = 2025 -07-14T22:25:55.157+08:00level = INFOsource = routes.go:1235msg = "server config" env = "map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\ Users\\ Admin\\ .ollama\\ models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]" time = 2025 -07-14T22:25:55.171+08:00level = INFOsource = images.go:476msg = "total blobs: 5" time = 2025 -07-14T22:25:55.173+08:00level = INFOsource = images.go:483msg = "total unused blobs removed: 0" time = 2025 -07-14T22:25:55.174+08:00level = INFOsource = routes.go:1288msg = "Listening on 127.0.0.1:11434 (version 0.9.6)" time = 2025 -07-14T22:25:55.174+08:00level = INFOsource = gpu.go:217msg = "looking for compatible GPUs" time = 2025 -07-14T22:25:55.174+08:00level = INFOsource = gpu_windows.go:167msg = packagescount = 1 time = 2025 -07-14T22:25:55.174+08:00level = INFOsource = gpu_windows.go:183msg = "efficiency cores detected" maxEfficiencyClass = 1 time = 2025 -07-14T22:25:55.174+08:00level = INFOsource = gpu_windows.go:214msg = "" package = 0 cores = 10 efficiency = 4 threads = 16 time = 2025 -07-14T22:25:55.306+08:00level = INFOsource = types.go:130msg = "inference compute" id = GPU-c34967ed-d57d-9254-842e-328176c8fff7library = cudavariant = v12compute = 8.9 driver = 12.8 name = "NVIDIA GeForce RTX 4060" total = "8.0 GiB" available = "6.9 GiB" 点击右下角这个图标打开AI特性,然后点击配置图标,设置cline参数; cline报错如下,同时客户端中也有报错信息: llama_context: CPU output buffer size= 0.14 MiB llama_kv_cache_unified: kv_size= 16384 , type_k= 'f16' , type_v= 'f16' , n_layer= 32 , can_shift= 1 , padding= 32 llama_kv_cache_unified: CUDA0 KV buffer size= 3072.00 MiB ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size5368709120 alloc_tensor_range: failed to allocate CPU buffer of size5368709120 llama_init_from_model: failed to initialize the context: failed to allocate bufferfor kv cache panic: unable to create llama context原因是cuda没安装,在这里下载:https://developer.nvidia.com/cuda-toolkit-archive cuda安装后问题仍然存在,原因是上下文限制,需要配置ollama运行时的上下文长度; cline使用deepseek API和本地模型时表现不一致,使用deepseek API时可以自动读取当前处于active状态的文件,使用本地模型时则无法感知到本地文件,似乎需要额外配置。可能的原因如下: