Hateful videos pose serious risks by amplifying discrimination, inciting violence, and undermining online safety. Existing training-based hateful video detection methods are constrained by limited training data and lack of interpretability, while directly prompting large vision-language models often struggle to deliver reliable hate detection. To address these challenges, this paper introduces MARS, a training-free Multi-stage Adversarial ReaSoning framework that enables reliable and interpretable hateful content detection. MARS begins with the objective description of video content, establishing a neutral foundation for subsequent analysis. Building on this, it develops evidence-based reasoning that supports potential hateful interpretations, while in parallel incorporating counter-evidence reasoning to capture plausible non-hateful perspectives. Finally, these perspectives are synthesized into a conclusive and explainable decision. Extensive evaluation on two real-world datasets shows that MARS achieves up to 10% improvement under certain backbones and settings compared to other training-free approaches and outperforms state-of-the-art training-based methods on one dataset. In addition, MARS produces human-understandable justifications, thereby supporting compliance oversight and enhancing the transparency of content moderation workflows. The code is available at https://github.com/Multimodal-Intelligence-Lab-MIL/MARS.
翻译:仇恨视频通过加剧歧视、煽动暴力和破坏在线安全构成严重风险。现有的基于训练的仇恨视频检测方法受限于有限的训练数据和缺乏可解释性,而直接提示大型视觉语言模型往往难以实现可靠的仇恨检测。为应对这些挑战,本文提出MARS,一种无需训练的多阶段对抗推理框架,能够实现可靠且可解释的仇恨内容检测。MARS首先对视频内容进行客观描述,为后续分析建立中立基础。在此基础上,它发展出支持潜在仇恨解释的证据推理,同时并行整合反证据推理以捕捉合理的非仇恨视角。最后,这些视角被综合成结论性且可解释的决策。在两个真实数据集上的广泛评估表明,在某些骨干网络和设置下,MARS相比其他无需训练方法实现了高达10%的性能提升,并在一个数据集上超越了最先进的基于训练的方法。此外,MARS生成人类可理解的判断依据,从而支持合规监督并增强内容审核工作流程的透明度。代码发布于https://github.com/Multimodal-Intelligence-Lab-MIL/MARS。