Modern deepfakes have evolved into localized and intermittent manipulations that require fine-grained temporal localization. The prohibitive cost of frame-level annotation makes weakly supervised methods a practical necessity, which rely only on video-level labels. To this end, we propose Reconstruction-based Temporal Deepfake Localization (RT-DeepLoc), a weakly supervised temporal forgery localization framework that identifies forgeries via reconstruction errors. Our framework uses a Masked Autoencoder (MAE) trained exclusively on authentic data to learn its intrinsic spatiotemporal patterns; this allows the model to produce significant reconstruction discrepancies for forged segments, effectively providing the missing fine-grained cues for localization. To robustly leverage these indicators, we introduce a novel Asymmetric Intra-video Contrastive Loss (AICL). By focusing on the compactness of authentic features guided by these reconstruction cues, AICL establishes a stable decision boundary that enhances local discrimination while preserving generalization to unseen forgeries. Extensive experiments on large-scale datasets, including LAV-DF, demonstrate that RT-DeepLoc achieves state-of-the-art performance in weakly-supervised temporal forgery localization.
翻译:现代深度伪造技术已演变为局部化、间歇性的篡改,需要精细的时序定位。逐帧标注的极高成本使得弱监督方法成为实际必需,这类方法仅依赖视频级标签。为此,我们提出基于重构的时序深度伪造定位框架(RT-DeepLoc),这是一种通过重构误差识别伪造内容的弱监督时序伪造定位方法。该框架采用仅在真实数据上训练的掩码自编码器(MAE)来学习其固有的时空模式;这使得模型能够对伪造片段产生显著的重构差异,从而有效提供定位所需的细粒度线索。为鲁棒地利用这些指标,我们提出了一种新颖的非对称视频内对比损失(AICL)。通过关注由重构线索引导的真实特征的紧致性,AICL建立了一个稳定的决策边界,在增强局部判别能力的同时保持对未见伪造类型的泛化性。在包括LAV-DF在内的大规模数据集上的大量实验表明,RT-DeepLoc在弱监督时序伪造定位任务中达到了最先进的性能。