Deepfake detection is a widely researched topic that is crucial for combating the spread of malicious content, with existing methods mainly modeling the problem as classification or spatial localization. The rapid advancements in generative models impose new demands on Deepfake detection. In this paper, we propose multimodal alignment and reinforcement for explainable Deepfake detection via vision-language models, termed MARE, which aims to enhance the accuracy and reliability of Vision-Language Models (VLMs) in Deepfake detection and reasoning. Specifically, MARE designs comprehensive reward functions, incorporating reinforcement learning from human feedback (RLHF), to incentivize the generation of text-spatially aligned reasoning content that adheres to human preferences. Besides, MARE introduces a forgery disentanglement module to capture intrinsic forgery traces from high-level facial semantics, thereby improving its authenticity detection capability. We conduct thorough evaluations on the reasoning content generated by MARE. Both quantitative and qualitative experimental results demonstrate that MARE achieves state-of-the-art performance in terms of accuracy and reliability.
翻译:深度伪造检测是广泛研究的重要课题,对于遏制恶意内容传播至关重要,现有方法主要将该问题建模为分类或空间定位任务。生成模型的快速发展对深度伪造检测提出了新的要求。本文提出基于视觉-语言模型的可解释深度伪造检测的多模态对齐与强化方法,称为MARE,旨在提升视觉-语言模型在深度伪造检测与推理中的准确性与可靠性。具体而言,MARE设计了综合奖励函数,结合基于人类反馈的强化学习,激励生成符合人类偏好且文本-空间对齐的推理内容。此外,MARE引入伪造解耦模块,从高层面部语义中捕获本质伪造痕迹,从而提升其真实性检测能力。我们对MARE生成的推理内容进行了全面评估。定量与定性实验结果均表明,MARE在准确性与可靠性方面达到了最先进的性能水平。