Due to the rising threat of deepfakes to security and privacy, it is most important to develop robust and reliable detectors. In this paper, we examine the need for high-quality samples in the training datasets of such detectors. Accordingly, we show that deepfake detectors proven to generalize well on multiple research datasets still struggle in real-world scenarios with well-crafted fakes. First, we propose a novel autoencoder for face swapping alongside an advanced face blending technique, which we utilize to generate 90 high-quality deepfakes. Second, we feed those fakes to a state-of-the-art detector, causing its performance to decrease drastically. Moreover, we fine-tune the detector on our fakes and demonstrate that they contain useful clues for the detection of manipulations. Overall, our results provide insights into the generalization of deepfake detectors and suggest that their training datasets should be complemented by high-quality fakes since training on mere research data is insufficient.
翻译:由于深度伪造对安全和隐私的威胁日益加剧,开发稳健可靠的检测器至关重要。本文探讨了检测器训练数据集中高质量样本的必要性。研究表明,在多个研究数据集上表现良好的深度伪造检测器,在面对精心伪造的欺骗内容时仍难以应对真实场景。首先,我们提出一种新型自编码器用于换脸技术,并辅以先进的融合方法,生成90个高质量深度伪造样本。其次,将这些伪造样本输入最先进的检测器,导致其性能显著下降。此外,通过对检测器进行微调,我们证明这些伪造样本包含可检测篡改的有用线索。总体而言,本研究揭示了深度伪造检测器的泛化特性,并指出仅依靠研究数据训练不足以应对现实威胁,有必要在训练数据集中补充高质量伪造样本。