当前位置: 首页 > news >正文

使用vLLM部署Qwen3 Reranker系列模型

使用SGLang部署的版本可查看另一篇文章:使用SGLang部署Qwen3 Reranker系列模型
实测使用vLLM部署的推理速度更快,QPS更高

vLLM安装

使用官方流程进行vLLM的安装(vLLM官方文档,Qwen官方vLLM安装文档)

conda create-nmyenvpython=3.10-yconda activate myenv pipinstallvllm

vLLM部署Qwen3 Reranker系列(0.6B/4B/8B)模型

根据官方部署Reranker模型的教程,使用vLLM部署Qwen3 Reranker系列的模型时,会出现报错,显示不支持相应API(The model does not support Score API),先说结论,vLLM是可以部署Qwen3 Reranker系列的模型的,只是需要进行一定的转换。

首先,Qwen3-rerankerQwen3ForCausalLM架构的模型,也就是说,它本质是一个基于生成式的模型架构,vLLM官方显示是支持该形式的模型的。


然而,在实操过程中,会发现,当使用如下指令进行部署时

vllm serve{model_path}

会输出以下日志,在部署完成之后,vLLM会默认这个架构是一个生成式的模型,仅支持chat模板,也就是下图中的红色区域,白色区域的API是不可使用的。


当按照官方教程构造client并进行白色区域的API使用时,会出现如下报错:

{'error':{'message':'The model does not support Score API','type':'BadRequestError','param': None,'code': 400}}

这是因为,vLLM目前无法支持单个架构同时支持Embedding 和 Reranker,一个可行的方案就是,将token_false_id = 2152token_true_id = 9693提取到一个二分类任务中,而不是当前的151669分类任务,最后使用vLLM的scoreAPI来进行推理的实现,也就是说,要将双向分类器变成单向分类器,将原始的Qwen3ForCausalLM架构转换为Qwen3ForSequenceClassification架构,可以使用如下代码。(代码来源)

importtorchfromtransformersimportQwen3ForCausalLM,Qwen3ForSequenceClassification,AutoTokenizerdefconvert_model(model_path,save_path):# --- Step 1: Load the Causal LM and extract lm_head weights ---print(f"1. Loading Causal LM:{model_path}")tokenizer=AutoTokenizer.from_pretrained(model_path)causal_lm=Qwen3ForCausalLM.from_pretrained(model_path)# The lm_head is the final linear layer that maps hidden states to vocabulary logitslm_head_weights=causal_lm.lm_head.weightprint(f" lm_head weight shape:{lm_head_weights.shape}")# (vocab_size, hidden_size)# --- Step 2: Get the token IDs for "yes" and "no" ---print("\n2. Finding token IDs for 'yes' and 'no'")yes_token_id=tokenizer.convert_tokens_to_ids("yes")no_token_id=tokenizer.convert_tokens_to_ids("no")print(f" ID for 'yes':{yes_token_id}, ID for 'no':{no_token_id}")# --- Step 3: Create the classifier vector ---print("\n3. Creating the classifier vector from lm_head weights")# Extract the specific rows (weight vectors) for our target tokensyes_vector=lm_head_weights[yes_token_id]no_vector=lm_head_weights[no_token_id]# The new classifier is the difference between the 'yes' and 'no' vectorsclassifier_vector=yes_vector-no_vectorprint(f" Shape of the new classifier vector:{classifier_vector.shape}")# --- Step 4: Load the model as a Sequence Classifier ---print(f"\n4. Loading Sequence Classification model with num_labels=1")# num_labels=1 is key for binary classification represented by a single logitseq_cls_model=Qwen3ForSequenceClassification.from_pretrained(model_path,num_labels=1,ignore_mismatched_sizes=True)# --- Step 5: Replace the classifier's weights ---print("\n5. Replacing the randomly initialized classifier weights")# The classification head in Qwen is named 'score'. It's a torch.nn.Linear layer.# Its weight matrix has shape (num_labels, hidden_size), which is (1, hidden_size) here.withtorch.no_grad():# We need to add a dimension to our vector to match the (1, hidden_size) shapeseq_cls_model.score.weight.copy_(classifier_vector.unsqueeze(0))# It's good practice to zero out the bias for a clean transferifseq_cls_model.score.biasisnotNone:seq_cls_model.score.bias.zero_()print(" Classifier head replaced successfully.")# --- Verification: Prove that the logic works ---print("\n--- VERIFICATION ---")text="Is this a good example?"inputs=tokenizer(text,return_tensors="pt")# A. Get logits from the original Causal LMwithtorch.no_grad():outputs_causal=causal_lm(**inputs)last_token_logits=outputs_causal.logits[0,-1,:]manual_logit_diff=last_token_logits[yes_token_id]-last_token_logits[no_token_id]# Compute probs (yes/no) and extract 'yes' probconcat_logits=torch.stack([last_token_logits[yes_token_id],last_token_logits[no_token_id]])causal_prob=torch.softmax(concat_logits,dim=-1)[0]# B. Get the single logit from our new Sequence Classification modelwithtorch.no_grad():outputs_seq_cls=seq_cls_model(**inputs)# Shape is (1, 1), squeeze to scalarmodel_logit=outputs_seq_cls.logits.squeeze()# Compute 'yes' probclassification_prob=torch.sigmoid(model_logit)print(f"Input text: '{text}'")print(f"\nManual logit difference ('yes' - 'no'):{manual_logit_diff.item():.4f}")print(f"Sequence Classification model output:{model_logit.item():.4f}")print(f"Are they almost identical?{torch.allclose(manual_logit_diff,model_logit)}")# Probsprint(f"\nCausal prob (2 classes):{causal_prob.item():.4f}")print(f"Classification prob (1 class):{classification_prob.item():.4f}")print(f"Are they almost identical?{torch.allclose(causal_prob,classification_prob)}")seq_cls_model.save_pretrained(save_path)tokenizer.save_pretrained(save_path)print(f"Save model to:{save_path}")if__name__=="__main__":model_path="/home/Qwen/Qwen3-Reranker-0.6B"save_path="/home/Qwen/Qwen3-Reranker-0.6B-seqcls-converted"convert_model(model_path,save_path)

以上代码,将model_path和save_path替换之后,就可直接使用,转换之后,结果是相同的,如下所示


使用vLLM进行部署:

vllm serve /home/Qwen/Qwen3-Reranker-0.6B-seqcls-converted\--hf_overrides'{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'\

直接部署经常容易爆显存,建议加上--gpu-memory-utilization 0.6参数

基于Qwen3官方文档,构造的client如下所示。

importrequests url="http://127.0.0.1:8000/score"MODEL_NAME="Qwen3-Reranker-0.6B-seqcls-converted"prefix='<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'suffix="<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"query_template="{prefix}<Instruct>: {instruction}\n<Query>: {query}\n"document_template="<Document>: {doc}{suffix}"instruction=("Given a web search query, retrieve relevant passages that answer the query")queries=["What is the capital of China?","Explain gravity",]documents=["I want yo eat an apple.","Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",]queries=[query_template.format(prefix=prefix,instruction=instruction,query=query)forqueryinqueries]documents=[document_template.format(doc=doc,suffix=suffix)fordocindocuments]response=requests.post(url,json={"text_1":queries,"text_2":documents,"truncate_prompt_tokens":-1,}).json()print(response)

最终输出如下所示,结果符合预期,转换后的模型效果与转换前是一致的。

{ 'id': 'score-a918997f9ba1424f', 'object': 'list', 'created': 1765251739, 'model': '/home/Qwen/Qwen3-Reranker-0.6B-seqcls-converted', 'data': [{'index': 0, 'object': 'score', 'score': 0.0001038978953147307}, {'index': 1, 'object': 'score', 'score': 0.993419349193573}], 'usage': {'prompt_tokens': 188, 'total_tokens': 188, 'completion_tokens': 0, 'prompt_tokens_details': None} }

参考解决方案

  • vLLM部署Qwen3-Reranker:https://github.com/vllm-project/vllm/pull/19260
  • Qwen3-Reranker模型转换:https://huggingface.co/Qwen/Qwen3-Reranker-0.6B/discussions/3
http://www.jsqmd.com/news/622191/

相关文章:

  • 5分钟搞定Qwen3-ASR部署:小白也能轻松搭建语音识别服务
  • 基于 Vue + TS + Ant Design Vue 实现精细化菜单按钮权限授权组件昧
  • DuckDB 1.4.3 LTS:轻量级分析型数据库的新选择
  • EasyAnimateV5完整使用流程:从图片准备到视频输出的全步骤
  • Acunetix WVS 13实战:如何高效扫描企业网站漏洞并生成专业报告
  • 2026年知名的分仓缝变形缝/屋面变形缝/铠甲缝变形缝/内墙变形缝用户口碑认可厂家 - 行业平台推荐
  • Qwen3.5-4B-Claude推理模型入门必看:中文问答+分步解题+代码生成全解析
  • SourceGit终极指南:轻松驾驭跨平台Git图形化客户端
  • Linux内核中的块设备驱动详解
  • 深入解析AutoTokenizer.from_pretrained:参数配置与实战应用
  • BERT中文模型实战指南:从零开始搭建智能文本分类系统
  • 2026年热门的空气型母线槽/密集型母线槽/铝基动力母线槽新厂实力推荐(更新) - 行业平台推荐
  • AI工程化困局破冰时刻:AISMM发布背后,是20年AI系统研发经验沉淀的终极凝练
  • 大麦网自动抢票Python脚本:5步实现高成功率智能购票系统
  • 2026年靠谱的电动喷泵动力总成/东莞冲浪板电动喷泵厂家推荐与采购指南 - 行业平台推荐
  • 2026年质量好的透明眼影盒/磁吸式眼影盒信誉优质供应参考(可靠) - 行业平台推荐
  • DAMOYOLO-S入门必看:置信度阈值调优与检测精度实测
  • LangChain 源码剖析-消息类详解(Messages)
  • STM32裸机开发进阶:时间片轮询 vs 前后台,你的项目到底该选谁?(附对比实验)
  • UniApp+Vue3项目升级Unocss 0.60踩坑记:手把手教你降级到0.58解决ESM报错
  • 2026年评价高的青花椒油/汉源花椒油/无添加花椒油厂家质量参考评选 - 行业平台推荐
  • DefenderCheck代码剖析:从HexDump到威胁检测的完整实现
  • 2026年比较好的湖北地坪漆/车库地坪漆/水性地坪漆/艺术地坪漆厂家选购参考建议 - 行业平台推荐
  • 2026年比较好的河北开袋即食烧鸡/河北烧鸡/玉田正宗烧鸡/河北老式烧鸡实力工厂怎么选 - 行业平台推荐
  • 探索开源软件 Vortex:功能与应用全解析
  • MiniCPM-V-2_6错误分析:常见图文理解失败案例与修复策略汇总
  • Ostrakon-VL-8B效果展示:从模糊监控截图中精准提取价格与商品名
  • LumiPixel人像创作站快速部署:5分钟搭建你的像素艺术工作站
  • 2026年比较好的环保五金智能健康收纳/等离子释放厨房智能健康收纳/紫外线杀菌功能智能健康收纳稳定供应商推荐 - 行业平台推荐
  • 2026年热门的扬州滑冰场设备/滑冰场建设/滑冰场安装热门品牌厂家推荐 - 行业平台推荐