The field of visual and audio generation is burgeoning with new state-of-the-art methods. This rapid proliferation of new techniques underscores the need for robust solutions for detecting synthetic content in videos. In particular, when fine-grained alterations via localized manipulations are performed in visual, audio, or both domains, these subtle modifications add challenges to the detection algorithms. This paper presents solutions for the problems of deepfake video classification and localization. The methods were submitted to the ACM 1M Deepfakes Detection Challenge, achieving the best performance in the temporal localization task and a top four ranking in the classification task for the TestA split of the evaluation dataset.
翻译:视觉与音频生成领域正涌现出诸多前沿方法。新技术的快速扩散凸显了对视频合成内容进行鲁棒检测的迫切需求。尤其当通过局部篡改在视觉、音频或双模态领域进行细粒度修改时,这些细微的篡改行为为检测算法带来了额外挑战。本文针对深度伪造视频分类与定位问题提出了解决方案。相关方法已提交至ACM百万级深度伪造检测挑战赛,在评估数据集TestA子集上,取得了时序定位任务的最佳性能,并在分类任务中位列前四。