The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, temporal forgery localization (TFL), which aims to precisely pinpoint tampered segments, becomes critical. However, existing methods are often constrained by \emph{local view}, failing to capture global anomalies. To address this, we propose a \underline{d}ual-stream graph learning and \underline{d}isentanglement framework for temporal forgery localization (DDNet). By coordinating a \emph{Temporal Distance Stream} for local artifacts and a \emph{Semantic Content Stream} for long-range connections, DDNet prevents global cues from being drowned out by local smoothness. Furthermore, we introduce Trace Disentanglement and Adaptation (TDA) to isolate generic forgery fingerprints, alongside Cross-Level Feature Embedding (CLFE) to construct a robust feature foundation via deep fusion of hierarchical features. Experiments on ForgeryNet and TVIL benchmarks demonstrate that our method outperforms state-of-the-art approaches by approximately 9\% in AP@0.95, with significant improvements in cross-domain robustness.
翻译:AIGC技术的快速发展使得仅需篡改视频中的微小片段即可误导观众,导致视频级检测既不准确也缺乏说服力。因此,旨在精确定位篡改片段的时序伪造定位技术变得至关重要。然而,现有方法常受限于\emph{局部视角},难以捕捉全局异常。为此,我们提出一种面向时序伪造定位的\underline{双}流图学习与\underline{解}缠框架(DDNet)。通过协调关注局部伪影的\emph{时序距离流}与建模长程关联的\emph{语义内容流},DDNet有效防止全局线索被局部平滑性所淹没。此外,我们引入轨迹解缠与适配模块以分离通用伪造指纹,并结合跨层级特征嵌入模块,通过层次特征的深度融合构建鲁棒的特征基础。在ForgeryNet与TVIL基准上的实验表明,本方法在AP@0.95指标上以约9\%的优势超越现有最优方法,并在跨域鲁棒性方面取得显著提升。