With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs. To bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. We evaluate representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6\% and 73.2% AUC, respectively. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios.
翻译:随着人工智能驱动的实时深度伪造(RTDF)技术的兴起,在线视频交互的完整性日益成为关注焦点。RTDF技术已能在实时视频交互中将冒充者的面部替换为受害者面容。深度伪造技术的这种进步也要求检测技术达到同等标准。然而,现有深度伪造检测技术存在异步性缺陷,难以适用于RTDF场景。为填补这一空白,我们提出一种基于挑战-响应的身份认证方法,可在实时场景中建立真实性验证机制。本研究聚焦于说话人头部视频交互模式,提出专门针对RTDF生成流程固有缺陷的挑战分类体系。我们通过收集包含八类挑战的独特数据集,对分类体系中的代表性案例进行验证,结果表明这些挑战能持续且显著地降低当前最优深度伪造生成器的输出质量。该结果经人工评估与新型自动评分函数双重验证,其AUC值分别达到88.6%和73.2%。研究结果充分表明,挑战-响应系统在实际场景中具有实现可解释、可扩展的实时深度伪造检测的巨大潜力。