rk3588/rk3576使用rkllm推理大模型,提供OpenAI服务
infer-rkllm-openai
- gitee开源地址:https://gitee.com/Vanishi/infer-rkllm-openai
- github开源地址:https://github.com/beixiaocai/infer-rkllm-openai
基于 Rockchip RKLLM 的视觉语言模型推理服务,提供完全兼容 OpenAI API 格式的 HTTP 接口。
- 提示: Intel CPU/GPU 用户请查看 https://gitee.com/Vanishi/infer-openvino-openai
- 作者已准备的模型下载地址:https://pan.quark.cn/s/d2b152fbea26
硬件要求
- 开发板: Rockchip RK3576 或 RK3588
- 运行时库:
/usr/local/lib/librkllmrt.so和/usr/local/lib/librknnrt.so - 内存: 建议 4GB+
快速开始
1. 安装依赖
pip install -r requirements.txt
2. 启动服务
python infer-rkllm-openai.py \ --model-path /path/to/model.rkllm \ --vision-model /path/to/vision.rknn \ --platform rk3576 \ --host 0.0.0.0 --port 9696
常用参数:
--model-path: RKLLM 模型路径(必填)--vision-model: Vision 模型路径(可选)--platform: 平台类型,rk3576或rk3588--host: 服务地址,默认0.0.0.0--port: 服务端口,默认9696--rknn-cores: NPU 核心数 (1/2/3),默认 2
3. 访问服务
- 首页: http://localhost:9696/
- 管理后台: http://localhost:9696/dashboard
- API: http://localhost:9696/v1/chat/completions
API 使用
Python SDK
from openai import OpenAI client = OpenAI( api_key="sk-rebucca", base_url="http://localhost:9696/v1" ) # 文本对话 response = client.chat.completions.create( model="qwen3-vl-4b", messages=[{"role": "user", "content": "你好"}] ) print(response.choices[0].message.content) # 图片理解 import base64 with open("demo.jpeg", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() response = client.chat.completions.create( model="qwen3-vl-4b", messages=[{ "role": "user", "content": [ {"type": "text", "text": "描述这张图片"}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}} ] }] ) # 流式输出 stream = client.chat.completions.create( model="qwen3-vl-4b", messages=[{"role": "user", "content": "你好"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)curl
curl http://localhost:9696/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer sk-rebucca" \ -d '{"model":"qwen3-vl-4b","messages":[{"role":"user","content":"你好"}]}'运行测试
python tests.py # 或指定地址 python tests.py --base-url http://192.168.1.15:9696/v1
环境变量
export RKLLM_LIB=/opt/rkllm/librkllmrt.so export RKNN_LIB=/opt/rknn/librknnrt.so
注意事项
- 必须在 Rockchip RK3576/RK3588 设备上运行
- 模型必须为
.rkllm和.rknn格式 - W4A16 量化模型内存占用约 2-3GB
- 单线程处理,请求会排队
版本
- Python: 3.8+
- Flask: >=3.0.0
- 支持平台: RK3576, RK3588
