Omni-modal Large Language Models (OLLMs) that process text, images, videos, and audio introduce new challenges for safety and value guardrails in human-AI interaction. Prior guardrail research largely targets unimodal settings and typically frames safeguarding as binary classification, which limits robustness across diverse modalities and tasks. To address this gap, we propose OmniGuard, the first family of omni-modal guardrails that performs safeguarding across all modalities with deliberate reasoning ability. To support the training of OMNIGUARD, we curate a large, comprehensive omni-modal safety dataset comprising over 210K diverse samples, with inputs that cover all modalities through both unimodal and cross-modal samples. Each sample is annotated with structured safety labels and carefully curated safety critiques from expert models through targeted distillation. Extensive experiments on 15 benchmarks show that OmniGuard achieves strong effectiveness and generalization across a wide range of multimodal safety scenarios. Importantly, OmniGuard provides a unified framework that enforces policies and mitigates risks in omni-modalities, paving the way toward building more robust and capable omnimodal safeguarding systems.
翻译:能够处理文本、图像、视频和音频的全模态大语言模型(OLLMs)为人机交互中的安全与价值护栏带来了新的挑战。以往的护栏研究主要针对单模态场景,通常将安全防护视为二元分类问题,这限制了其在多样化模态和任务中的鲁棒性。为弥补这一不足,我们提出了OmniGuard,这是首个具备审慎推理能力的全模态护栏系列,能够在所有模态上执行安全防护。为支持OMNIGUARD的训练,我们构建了一个大规模、全面的全模态安全数据集,包含超过21万个多样化样本,其输入通过单模态和跨模态样本覆盖所有模态。每个样本均标注有结构化安全标签,并通过定向蒸馏从专家模型中获取精心构建的安全评估。在15个基准测试上的广泛实验表明,OmniGuard在广泛的多模态安全场景中展现出强大的有效性和泛化能力。重要的是,OmniGuard提供了一个统一的框架,可在全模态中执行策略并降低风险,为构建更鲁棒、更强大的全模态安全防护系统铺平了道路。