Query-Efficient Video Adversarial Attack with Stylized Logo on Service Computing

In service computing, video classification has become fundamental to many intelligent applications. While Deep Neural Networks (DNNs) have demonstrated excellent performance in recognizing video content, recent studies have shown that DNNs are highly vulnerable to adversarial examples. Thus, understanding adversarial attacks can better respond to emergency situations. In order to improve attack performance, many style-transfer-based attacks and patch-based attacks have been proposed. However, the global perturbation of the former will bring unnatural global colors, while the latter is difficult to achieve success in targeted attacks due to the limited perturbation space. Moreover, compared to a plethora of methods targeting image classifiers, video adversarial attacks remain relatively underexplored. Therefore, to generate adversarial examples with a low budget and to provide them with a higher verisimilitude, we propose a novel black-box video attack framework, called Stylized Logo Attack (SLA). SLA is conducted through three stages. The first stage involves building a style reference set for logos, which can not only make the generated examples more natural, but also carry more target class features in targeted attacks. Then, Reinforcement Learning is employed to determine the style reference and position parameters of the logo within the video, which ensures that the stylized logo is placed in the video with optimal attributes. Finally, perturbations are optimized in a step-by-step manner so as to improve the fooling rate. Experimental results indicate that SLA can achieve better performance than state-of-the-art methods and still maintain good deception effects when facing various defense methods. We believe SLA can raise awareness among the security community about the reliability and security of video classification systems and serve as a memorandum of possible attack methods.

翻译：在服务计算中，视频分类已成为众多智能应用的基础。尽管深度神经网络（DNN）在识别视频内容方面表现出卓越性能，但最新研究表明DNN极易受到对抗样本的攻击。因此，理解对抗攻击有助于更好地应对突发状况。为提升攻击性能，研究者已提出多种基于风格迁移的攻击和基于补丁块的攻击。然而，前者产生的全局扰动会带来不自然的全局色彩，后者因扰动空间有限难以实现定向攻击的成功。此外，与针对图像分类器的诸多方法相比，视频对抗攻击的研究仍相对不足。为此，我们提出一种新型黑盒视频攻击框架——风格化图标攻击（SLA），旨在以低预算生成对抗样本并赋予其更高逼真度。SLA通过三个阶段实施：第一阶段构建图标风格参考集，这不仅能提升生成样本的自然性，还能在定向攻击中携带更多目标类别特征；第二阶段采用强化学习确定图标的风格参考与在视频中的位置参数，确保风格化图标以最优属性放置；最后分步优化扰动以提高欺骗率。实验结果表明，SLA在性能上优于现有主流方法，且面对多种防御手段时仍能保持良好欺骗效果。我们相信SLA能够提升安全社区对视频分类系统可靠性与安全性的认知，并作为可能攻击方法的备忘录。