A lip-syncing deepfake is a digitally manipulated video in which a person's lip movements are created convincingly using AI models to match altered or entirely new audio. Lip-syncing deepfakes are a dangerous type of deepfakes as the artifacts are limited to the lip region and more difficult to discern. In this paper, we describe a novel approach, LIP-syncing detection based on mouth INConsistency (LIPINC), for lip-syncing deepfake detection by identifying temporal inconsistencies in the mouth region. These inconsistencies are seen in the adjacent frames and throughout the video. Our model can successfully capture these irregularities and outperforms the state-of-the-art methods on several benchmark deepfake datasets.
翻译:唇语同步深度伪造是一种数字操纵视频,其中人物的嘴唇运动通过AI模型被令人信服地生成,以匹配被篡改或全新音频。这种深度伪造因其伪迹仅限于唇部区域且更难察觉而具有危险性。本文描述了一种基于口部不一致性的唇语同步检测方法(LIPINC),通过识别口部区域的时序不一致性来检测唇语同步深度伪造。这些不一致性体现在相邻帧及整个视频中。我们的模型能够成功捕获这些异常,并在多个基准深度伪造数据集上的表现优于现有最先进方法。