Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regressing clean videos from adversarial inputs often fails to recover faithful content due to the subtle nature of perturbations; this necessitates physically shattering the adversarial structure. Therefore, we propose Flow Matching for Adversarial Video Purification FMVP. FMVP physically shatters global adversarial structures via a masking strategy and reconstructs clean video dynamics using Conditional Flow Matching (CFM) with an inpainting objective. To further decouple semantic content from adversarial noise, we design a Frequency-Gated Loss (FGL) that explicitly suppresses high-frequency adversarial residuals while preserving low-frequency fidelity. We design Attack-Aware and Generalist training paradigms to handle known and unknown threats, respectively. Extensive experiments on UCF-101 and HMDB-51 demonstrate that FMVP outperforms state-of-the-art methods (DiffPure, Defense Patterns (DP), Temporal Shuffling (TS) and FlowPure), achieving robust accuracy exceeding 87% against PGD and 89% against CW attacks. Furthermore, FMVP demonstrates superior robustness against adaptive attacks (DiffHammer) and functions as a zero-shot adversarial detector, attaining detection accuracies of 98% for PGD and 79% for highly imperceptible CW attacks.
翻译:视频识别模型仍然容易受到对抗性攻击,而现有的基于扩散的净化方法存在采样效率低和轨迹弯曲的问题。由于扰动的微妙特性,直接从对抗性输入回归干净视频往往无法恢复忠实内容;这需要物理上粉碎对抗性结构。因此,我们提出了用于对抗性视频净化的流匹配方法FMVP。FMVP通过掩码策略物理粉碎全局对抗性结构,并利用条件流匹配(CFM)结合修复目标重建干净视频动态。为了进一步将语义内容与对抗性噪声解耦,我们设计了一种频率门控损失(FGL),明确抑制高频对抗性残差,同时保持低频保真度。我们设计了攻击感知和通用训练范式,分别处理已知和未知威胁。在UCF-101和HMDB-51数据集上的大量实验表明,FMVP优于现有最先进方法(DiffPure、防御模式(DP)、时序混洗(TS)和FlowPure),在PGD攻击下实现了超过87%的鲁棒准确率,在CW攻击下超过89%。此外,FMVP对自适应攻击(DiffHammer)表现出卓越的鲁棒性,并可作为零样本对抗性检测器,对PGD攻击的检测准确率达到98%,对高度不可感知的CW攻击达到79%。