We present GuardReasoner-Omni, a reasoning-based guardrail model designed to moderate text, image, and video data. First, we construct a comprehensive training corpus comprising 148k samples spanning these three modalities. Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions: (1) conducting SFT to cold-start the model with explicit reasoning capabilities and structural adherence; and (2) performing RL, incorporating an error-driven exploration reward to incentivize deeper reasoning on hard samples. We release a suite of models scaled at 2B and 4B parameters. Extensive experiments demonstrate that GuardReasoner-Omni achieves superior performance compared to existing state-of-the-art baselines across various guardrail benchmarks. Notably, GuardReasoner-Omni (2B) significantly surpasses the runner-up by 5.3% F1 score.
翻译:本文提出GuardReasoner-Omni,一种基于推理的护栏模型,旨在对文本、图像及视频数据进行内容审核。首先,我们构建了一个包含14.8万个样本的综合训练语料库,涵盖上述三种模态。我们的训练流程遵循两阶段范式,以激励模型在做出决策前进行审慎推理:(1) 进行监督微调,使模型具备显式推理能力和结构遵循能力,实现冷启动;(2) 执行强化学习,引入基于错误的探索奖励机制,以激励模型对困难样本进行更深层次的推理。我们发布了一系列参数规模为20亿和40亿的模型。大量实验表明,在各种护栏基准测试中,GuardReasoner-Omni的性能均优于现有的先进基线模型。值得注意的是,GuardReasoner-Omni (2B) 的F1分数显著超越第二名模型达5.3%。