The rapid advancement of talking-head deepfake generation fueled by advanced generative models has elevated the realism of synthetic videos to a level that poses substantial risks in domains such as media, politics, and finance. However, current benchmarks for deepfake talking-head detection fail to reflect this progress, relying on outdated generators and offering limited insight into model robustness and generalization. We introduce TalkingHeadBench, a comprehensive multi-model multi-generator benchmark and curated dataset designed to evaluate the performance of state-of-the-art detectors on the most advanced generators. Our dataset includes deepfakes synthesized by leading academic and commercial models and features carefully constructed protocols to assess generalization under distribution shifts in identity and generator characteristics. We benchmark a diverse set of existing detection methods, including CNNs, vision transformers, and temporal models, and analyze their robustness and generalization capabilities. In addition, we provide error analysis using Grad-CAM visualizations to expose common failure modes and detector biases. TalkingHeadBench is hosted on https://huggingface.co/datasets/luchaoqi/TalkingHeadBench with open access to all data splits and protocols. Our benchmark aims to accelerate research towards more robust and generalizable detection models in the face of rapidly evolving generative techniques.
翻译:由先进生成模型推动的说话人头像深度伪造生成技术快速发展,已将合成视频的真实感提升至对媒体、政治和金融等领域构成重大风险的水平。然而,当前的深度伪造说话人头像检测基准未能反映这一进展,它们依赖于过时的生成器,并且对模型鲁棒性和泛化能力的洞察有限。我们提出了TalkingHeadBench,这是一个全面的多模型多生成器基准和精心策划的数据集,旨在评估最先进检测器在最新生成器上的性能。我们的数据集包含由领先的学术和商业模型合成的深度伪造视频,并设计了精心构建的协议,以评估在身份和生成器特征分布变化下的泛化能力。我们对一系列现有的检测方法进行了基准测试,包括CNN、视觉Transformer和时序模型,并分析了它们的鲁棒性和泛化能力。此外,我们利用Grad-CAM可视化提供了误差分析,以揭示常见的失败模式和检测器偏差。TalkingHeadBench托管在https://huggingface.co/datasets/luchaoqi/TalkingHeadBench,所有数据划分和协议均可公开访问。我们的基准旨在加速面对快速发展的生成技术时,研究出更鲁棒、更具泛化能力的检测模型。