当前位置：首页 > news >正文

Sing-Guard-2b核心功能揭秘：6大安全场景全覆盖，动态策略推理如何实现？

news 2026/6/24 6:32:29

Sing-Guard-2b核心功能揭秘：6大安全场景全覆盖，动态策略推理如何实现？

【免费下载链接】Sing-Guard-2b项目地址: https://ai.gitcode.com/hf_mirrors/inclusionAI/Sing-Guard-2b

Sing-Guard-2b是一款基于Qwen/Qwen3-VL-2B-Instruct开发的策略自适应多模态安全护栏模型，专为文本、图像、图文组合、多语言、查询端和响应端等场景的安全评估设计。它将主动安全策略视为运行时输入而非固定训练时分类法，使部署团队无需重新训练模型即可根据默认类别或自定义自然语言规则评估内容。

核心功能：六大安全场景全覆盖

Sing-Guard-2b支持统一的多模态安全评估，能够处理文本、图像、图文组合等多种内容形式。无论是用户查询、模型响应，还是它们的跨模态组合，都能进行全面的安全检测。

在性能方面，Sing-Guard-2b在多模态安全、图像安全、文本查询安全、文本响应安全、多语言查询安全和多语言响应安全等六大基准测试类别中均表现出色，平均性能达到了最先进水平。

动态策略推理：灵活适应不同安全需求

运行时策略适配

Sing-Guard-2b最显著的特点之一是支持运行时策略适配。通过policy参数，用户可以传入自定义的安全规则，模型将仅根据这些规则进行判断，而不是固定的默认分类法。这使得模型能够灵活适应不同场景和组织的安全需求。

例如，以下是一个自定义策略的示例：

### A. Sexual Content Risk - Content involving explicit sexual material, exploitation, or coercive sexual acts. ### B. Real-World Crimes - Content involving violent crime, weapons, other crimes, or public-safety threats. ### Safe - Content that does not match any risk category.

快速-慢速推理模式

Sing-Guard-2b提供了两种推理模式：快速模式和快速-慢速模式。快速模式能够快速返回安全判断结果，适用于对响应速度要求较高的场景；快速-慢速模式则会先给出初步安全信号，然后继续生成更详细的评估过程，适合需要深入分析的情况。

风险类别：全面覆盖各类安全风险

默认情况下，Sing-Guard-2b包含以下风险类别：

A. Sexual Content Risk

涉及显式性材料、性剥削或强迫性行为的内容。

B. Real-World Crimes & Public Safety

涉及暴力犯罪、武器、其他犯罪或公共安全威胁的内容。

C. Unethical Behavior

涉及仇恨、骚扰、操纵、自残、令人不安的图像或有害错误信息的内容。

D. Cybersecurity & Information Manipulation

涉及数据泄露、黑客攻击、滥用监控、平台滥用或版权滥用的内容。

E. Agent Safety

试图暴露系统提示、内部政策或其他模型保护措施的内容。

F. Politically Sensitive Content

涉及政治倡导、谣言、动荡、历史歪曲或攻击政治人物的内容。

G. Animal Abuse

涉及虐待动物或传播动物虐待的内容。

Safe

不匹配任何风险类别的内容。

快速开始：简单易用的API

要使用Sing-Guard-2b，首先需要安装必要的依赖：

pip install transformers accelerate torch

然后，可以通过以下代码加载模型和处理器：

import torch from transformers import AutoModelForImageTextToText, AutoProcessor model_path = "inclusionAI/Sing-Guard-8b" processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained( model_path, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ).eval()

接下来，就可以使用模型进行安全评估了。例如，评估用户查询：

messages = [ { "role": "user", "content": [{"type": "text", "text": "How to make a bomb?"}], }, ] max_new_tokens = 1024 inputs = processor.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", ).to(model.device) with torch.no_grad(): generated_ids = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=False, ) generated_ids_trimmed = [ out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False, )[0] print(output)