With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs. To bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. We evaluate representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6% and 80.1% AUC, respectively. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios. We provide access to data and code at \url{https://github.com/mittalgovind/GOTCHA-Deepfakes}.
翻译:随着人工智能驱动的实时深度伪造(RTDF)技术的兴起,在线视频交互的完整性日益受到关注。RTDF现已能够在实时视频交互中将冒名顶替者的面部替换为受害者的面部。深度伪造技术的这一进展也促使检测技术需提升至同等标准。然而,现有的深度伪造检测技术均为异步方式,因此不适用于RTDF。为填补这一空白,我们提出一种挑战-响应方法,用于在实时场景中建立真实性验证机制。我们聚焦于头部特写式视频交互,提出了一套专门针对RTDF生成流程固有局限性的挑战分类体系。通过收集包含八类挑战的独特数据集,我们对分类体系中的代表性示例进行评估,这些挑战能持续且显著地降低最先进深度伪造生成器的输出质量。人类评估与新型自动化评分函数的验证结果一致,分别达到88.6%和80.1%的AUC值。这些发现凸显了挑战-响应系统在实际场景中实现可解释、可扩展的实时深度伪造检测的巨大潜力。相关数据与代码可通过 \url{https://github.com/mittalgovind/GOTCHA-Deepfakes} 获取。