We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, video, and audio data. First, we construct a comprehensive training corpus comprising 181k samples spanning these four modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL with a concise correctness reward to preserve accurate reasoning while suppressing redundant generation. We release a suite of models scaled at 3B and 7B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks.
翻译:我们提出GuardReasoner-Omni,一种基于推理的护栏模型,旨在对文本、图像、视频与音频数据进行审核。首先,我们构建了一个包含181k样本的综合训练语料库,覆盖上述四种模态。我们的训练流程采用两阶段范式,以激励模型在做出决策前进行审慎推理:(1)进行SFT(有监督微调),使模型具备显式推理能力与结构化遵循机制,实现冷启动;(2)采用简洁正确性奖励进行RL(强化学习),在保持准确推理的同时抑制冗余生成。我们发布了参数规模为3B与7B的系列模型。大量实验表明,GuardReasoner-Omni在各类护栏基准测试中均优于现有最优基线方法。