Advances in generative modeling have made it increasingly easy to fabricate realistic portrayals of individuals, creating serious risks for security, communication, and public trust. Detecting such person-driven manipulations requires systems that not only distinguish altered content from authentic media but also provide clear and reliable reasoning. In this paper, we introduce TriDF, a comprehensive benchmark for interpretable DeepFake detection. TriDF contains high-quality forgeries from advanced synthesis models, covering 16 DeepFake types across image, video, and audio modalities. The benchmark evaluates three key aspects: Perception, which measures the ability of a model to identify fine-grained manipulation artifacts using human-annotated evidence; Detection, which assesses classification performance across diverse forgery families and generators; and Hallucination, which quantifies the reliability of model-generated explanations. Experiments on state-of-the-art multimodal large language models show that accurate perception is essential for reliable detection, but hallucination can severely disrupt decision-making, revealing the interdependence of these three aspects. TriDF provides a unified framework for understanding the interaction between detection accuracy, evidence identification, and explanation reliability, offering a foundation for building trustworthy systems that address real-world synthetic media threats.
翻译:生成式建模的进步使伪造个体逼真肖像日益便捷,对安全、通信和公共信任构成严重威胁。检测此类人物驱动型篡改行为需要系统不仅能区分篡改内容与真实媒体,还需提供清晰可靠的推理依据。本文提出TriDF——一个面向可解释深度伪造检测的综合性基准。TriDF包含来自先进合成模型的高质量伪造样本,覆盖图像、视频和音频三种模态的16种深度伪造类型。该基准评估三个关键维度:感知能力——衡量模型利用人工标注证据识别细粒度篡改痕迹的能力;检测能力——评估模型对不同伪造家族和生成器的分类性能;幻觉程度——量化模型生成解释的可靠性。对前沿多模态大语言模型的实验表明,精准的感知能力是可靠检测的必要前提,但幻觉可能严重干扰决策过程,揭示了这三个维度间的相互依存关系。TriDF为理解检测精度、证据识别与解释可靠性之间的交互提供了统一框架,为构建应对现实世界合成媒体威胁的可信系统奠定基础。