Qwen 发布 Qwen3Guard 实时安全护栏模型 - AI最前沿

Tech Report GitHub Hugging Face ModelScope DISCORD

We are excited to introduce Qwen3Guard, the first safety guardrail model in the Qwen family. Built upon the powerful Qwen3 foundation models and fine-tuned specifically for safety classificatoin, Qwen3Guard ensures responsible AI interactions by delivering precise safety detection for both prompts and responses, complete with risk levels and categorized classifications for accurate moderation.

Qwen3Guard achieves state-of-the-art performance on major safety benchmarks, demonstrating strong capabilities in both prompt and response classification tasks across English, Chinese, and multilingual environments.

Qwen3Guard is available in two specialized variants:

Qwen3Guard-Gen , a generative model that accepts full user prompts and model responses to perform safety classification. Ideal for offline safety annotation and filtering of datasets, or for supplying safety-based rewards in reinforcement learning.

Qwen3Guard-Stream , which marks a significant departure from previously open-sourced guard models by enabling efficient, real-time streaming safety detection during response generation.

Both variants come in three sizes, 0.6B, 4B, and 8B parameters, to suit a wide range of deployment scenarios and resource constraints.

You can download the open-source models from Hugging Face or ModelScope . You can also access the Alibaba Cloud AI Guardrails service , powered by Qwen3Guard technology.

Qwen3Guard-Stream is engineered for low latency, on the fly moderation during token generation, ensuring safety without sacrificing responsiveness. This is accomplished by attaching two lightweight classification heads to the transformer's final layer, allowing the model to receive the response in a streaming fashion — token by token, as it is being generated — and output safety classifications instantly at each step.

Beyond the conventional Safe and Unsafe labels, we introduce an additional Controversial label to enable flexible safety policies tailored to diverse use cases. Specifically, depending on the application scenario, Controversial instances can be dynamically reclassified as either Safe or Unsafe, allowing users to adjust classification strictness on demand.

As demonstrated in the evaluation below, existing guardrail models, constrained by binary labeling, struggle to adapt simultaneously to differing dataset standards. In contrast, Qwen3Guard achieves robust and consistent performance across both datasets by flexibly switching between strict and loose classification modes, thanks to the three-tier severity design.

Qwen3Guard supports 119 languages and dialects , making it suitable for global deployments and cross-linguistic applications with consistent, high quality safety performance.

Qwen 官方博客发布 Qwen3Guard，面向实时 token 流的安全护栏模型。

通义千问这条官方动态围绕「Qwen 发布 Qwen3Guard 实时安全护栏模型」展开，英文标题为 “Qwen3Guard: Real-time Safety for Your Token Stream”。正文重点落在安全策略、护栏能力和高风险任务边界，需要结合官方发布内容理解它对模型使用和开发者接入的影响。

对用户来说，这类信息最有价值的部分是判断新能力是否已经可用、适合哪些任务，以及调用时可能受到哪些版本、地区、权限或产品形态限制。

放到 AI API 中转站评测场景，重点要看服务商是否真实支持相关模型或能力，模型名称、返回行为、延迟、错误信息、上下文限制和价格说明是否能相互印证。

后续自测时可以围绕「Qwen 发布 Qwen3Guard 实时安全护栏模型」设计更具体的探针任务：复杂提示词、连续对话、工具调用、多模态输入或代码任务都能帮助区分真实能力和只写在页面上的模型列表。