As video analysis using deep learning models becomes more widespread, the vulnerability of such models to adversarial attacks is becoming a pressing concern. In particular, Universal Adversarial Perturbation (UAP) poses a significant threat, as a single perturbation can mislead deep learning models on entire datasets. We propose a novel video UAP using image data and image model. This enables us to take advantage of the rich image data and image model-based studies available for video applications. However, there is a challenge that image models are limited in their ability to analyze the temporal aspects of videos, which is crucial for a successful video attack. To address this challenge, we introduce the Breaking Temporal Consistency (BTC) method, which is the first attempt to incorporate temporal information into video attacks using image models. We aim to generate adversarial videos that have opposite patterns to the original. Specifically, BTC-UAP minimizes the feature similarity between neighboring frames in videos. Our approach is simple but effective at attacking unseen video models. Additionally, it is applicable to videos of varying lengths and invariant to temporal shifts. Our approach surpasses existing methods in terms of effectiveness on various datasets, including ImageNet, UCF-101, and Kinetics-400.
翻译:随着基于深度学习模型的视频分析日益普及,此类模型面对对抗攻击的脆弱性已成为亟待关注的问题。特别是通用对抗扰动(UAP),因其单一扰动即可误导深度学习模型在整个数据集上的表现,构成了重大威胁。我们提出了一种利用图像数据和图像模型的新型视频UAP方法,从而能够充分利用视频应用中丰富的图像数据及基于图像模型的研究成果。然而,图像模型分析视频时序信息的能力有限,而这对实现成功的视频攻击至关重要。为应对这一挑战,我们引入"打破时序一致性"(BTC)方法,这是首次尝试利用图像模型将时序信息融入视频攻击。我们的目标是生成与原始视频具有相反模式的对抗视频。具体而言,BTC-UAP通过最小化视频相邻帧之间的特征相似性来实现攻击。该方法简洁高效,能有效攻击未见过的视频模型,且适用于不同长度的视频,并对时序位移具有不变性。在ImageNet、UCF-101和Kinetics-400等多个数据集上的实验表明,我们的方法在有效性上超越了现有技术。