Deepfakes are AI-generated media in which an image or video has been digitally modified. The advancements made in deepfake technology have led to privacy and security issues. Most deepfake detection techniques rely on the detection of a single modality. Existing methods for audio-visual detection do not always surpass that of the analysis based on single modalities. Therefore, this paper proposes an audio-visual-based method for deepfake detection, which integrates fine-grained deepfake identification with binary classification. We categorize the samples into four types by combining labels specific to each single modality. This method enhances the detection under intra-domain and cross-domain testing.
翻译:[翻译摘要] 深度伪造是指经数字篡改的AI生成图像或视频。深度伪造技术的进步引发了隐私与安全问题。现有检测技术多依赖单一模态分析,而基于音视频的检测方法未必总能超越单模态分析。为此,本文提出一种融合细粒度深度伪造识别与二分类的音视频检测方法。通过结合各单模态标签,我们将样本划分为四类。该方法可提升域内与跨域测试场景下的检测性能。