Deepfake detection automatically recognizes the manipulated medias through the analysis of the difference between manipulated and non-altered videos. It is natural to ask which are the top performers among the existing deepfake detection approaches to identify promising research directions and provide practical guidance. Unfortunately, it's difficult to conduct a sound benchmarking comparison of existing detection approaches using the results in the literature because evaluation conditions are inconsistent across studies. Our objective is to establish a comprehensive and consistent benchmark, to develop a repeatable evaluation procedure, and to measure the performance of a range of detection approaches so that the results can be compared soundly. A challenging dataset consisting of the manipulated samples generated by more than 13 different methods has been collected, and 11 popular detection approaches (9 algorithms) from the existing literature have been implemented and evaluated with 6 fair-minded and practical evaluation metrics. Finally, 92 models have been trained and 644 experiments have been performed for the evaluation. The results along with the shared data and evaluation methodology constitute a benchmark for comparing deepfake detection approaches and measuring progress.
翻译:深度伪造检测通过分析篡改视频与未篡改视频之间的差异,自动识别经过人为处理的媒体内容。现有研究中自然产生的问题是:在众多深度伪造检测方法中,哪些方法表现最优,从而能够指引有前景的研究方向并提供实践指导。然而,由于现有文献中的评估条件不一致,难以基于已发表结果对不同检测方法进行可靠的基准对比。本研究旨在构建一个全面且一致的基准框架,开发可重复的评估流程,并系统度量多种检测方法的性能,确保结果的可比性。我们收集了一个具有挑战性的数据集,其中包含由13种以上不同方法生成的篡改样本,并基于现有文献实现了11种主流检测方法(对应9种算法),采用6种公正且实用的评估指标进行评测。最终,我们训练了92个模型,开展了644次实验。实验结果、共享数据及评估方法论共同构成了深度伪造检测方法比较与进展度量的基准体系。