The rapid advancement of talking-head deepfake generation fueled by advanced generative models has elevated the realism of synthetic videos to a level that poses substantial risks in domains such as media, politics, and finance. However, current benchmarks for deepfake talking-head detection fail to reflect this progress, relying on outdated generators and offering limited insight into model robustness and generalization. We introduce TalkingHeadBench, a comprehensive multi-model multi-generator benchmark and curated dataset designed to evaluate the performance of state-of-the-art detectors on the most advanced generators. Our dataset includes deepfakes synthesized by leading academic and commercial models and features carefully constructed protocols to assess generalization under distribution shifts in identity and generator characteristics. We benchmark a diverse set of existing detection methods, including CNNs, vision transformers, and temporal models, and analyze their robustness and generalization capabilities. In addition, we provide error analysis using Grad-CAM visualizations to expose common failure modes and detector biases. TalkingHeadBench is hosted on https://huggingface.co/datasets/luchaoqi/TalkingHeadBench with open access to all data splits and protocols. Our benchmark aims to accelerate research towards more robust and generalizable detection models in the face of rapidly evolving generative techniques.
翻译:由先进生成模型推动的说话人深度伪造生成技术快速发展,已将合成视频的真实感提升至对媒体、政治和金融等领域构成重大风险的水平。然而,当前用于深度伪造说话人检测的基准未能反映这一进展,它们依赖过时的生成器,且对模型鲁棒性和泛化能力的洞察有限。我们推出了TalkingHeadBench——一个全面的多模型多生成器基准与精选数据集,旨在评估最先进检测器在顶级生成器上的性能。我们的数据集包含由领先学术和商业模型合成的深度伪造视频,并设计了精心构建的协议,以评估身份和生成器特征分布偏移下的泛化能力。我们对包括CNN、视觉Transformer及时序模型在内的多种现有检测方法进行了基准测试,并分析了它们的鲁棒性与泛化能力。此外,我们通过Grad-CAM可视化提供误差分析,以揭示常见的失效模式和检测器偏差。TalkingHeadBench已托管于https://huggingface.co/datasets/luchaoqi/TalkingHeadBench,所有数据划分和协议均可公开访问。本基准旨在加速面向快速演进生成技术的更鲁棒、更可泛化检测模型的研究。