瑞芯微(EASY EAI)RV1126B AI模型转换
1. AI模型转换
本章主要说明如何实现Hugging Face格式的大语言模型(Large Language Model, LLM)
如何转换为RKLLM模型,目前支持的模型包括Deepseek、LLaMA, Qwen, Qwen2, Phi-2, Phi-3, ChatGLM3, Gemma, InternLM2 和 MiniCPM等,本章以Deepseek-R1为例。
本章主要说明如何实现Deepseek-R1大语言模型如何转换为RKLLM模型。
1.1 模型下载
本节提供两种大模型文件,Hugging face的原始模型和转换完成的NPU模型。
下载链接: https://pan.baidu.com/s/1u05E5qZcilbxCWMW0Dl6ag?pwd=1234 (提取码: 1234)。
1.2 模型转换
下载完成后模型和脚本放到同一个目录:
在RKLLM-Toolkit环境,执行以下指令进行模型转换:
至此模型转换成功,生成deepseek_r1_rv1126b_w4a16.rkllm NPU化的大模型文件:
test.py转换脚本如下所示, 用于转换DeepSeek-R1-Distill-Qwen-1.5B模型:
from rkllm.api import RKLLM from datasets import load_dataset from transformers import AutoTokenizer from tqdm import tqdm import torch from torch import nn import os # os.environ['CUDA_VISIBLE_DEVICES']='1' modelpath = '/home/developer/RKLLM-Toolkit/DeepSeek-R1-Distill-Qwen-1.5B' llm = RKLLM() # Load model # Use 'export CUDA_VISIBLE_DEVICES=2' to specify GPU device # options ['cpu', 'cuda'] ret = llm.load_huggingface(model=modelpath, model_lora = None, device='cpu') # ret = llm.load_gguf(model = modelpath) if ret != 0: print('Load model failed!') exit(ret) # Build model dataset = "./data_quant.json" # Json file format, please note to add prompt in the input,like this: # [{"input":"Human: 你好!\nAssistant: ", "target": "你好!我是人工智能助手KK!"},...] qparams = None # qparams = 'gdq.qparams' # Use extra_qparams ret = llm.build(do_quantization=True, optimization_level=1, quantized_dtype='w4a16', quantized_algorithm='normal', target_platform='rv1126b', num_npu_core=1, extra_qparams=qparams, dataset=None) if ret != 0: print('Build model failed!') exit(ret) # Chat with model messages = "<|im_start|>system You are a helpful assistant.<|im_end|><|im_start|>user你好!\n<|im_end|><|im_start|>assistant" kwargs = {"max_length": 128, "top_k": 1, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1} # print(llm.chat_model(messages, kwargs)) # Export rkllm model ret = llm.export_rkllm("./deepseek_r1_rv1126b_w4a16.rkllm") if ret != 0: print('Export model failed!') exit(ret)