No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has received little attention in the academic community. In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack. Specifically, considering both the attack effect on quality score and the visual quality of adversarial video, the attack problem is formulated as misleading the estimated quality score under the constraint of just-noticeable difference (JND). Built upon such formulation, a novel loss function called Score-Reversed Boundary Loss is designed to push the adversarial video's estimated quality score far away from its ground-truth score towards a specific boundary, and the JND constraint is modeled as a strict $L_2$ and $L_\infty$ norm restriction. By this means, both white-box and black-box attacks can be launched in an effective and imperceptible manner. The source code is available at https://github.com/GZHU-DVL/AttackVQA.
翻译:无参考视频质量评估(NR-VQA)在提升终端用户观看体验中发挥着关键作用。在深度学习驱动下,基于卷积神经网络(CNN)和Transformer的现代NR-VQA模型已取得卓越性能。为构建可靠实用的评估系统,亟需评估其鲁棒性,然而学术界对此问题的关注尚显不足。本文首次尝试评估NR-VQA模型对抗性攻击的鲁棒性,并提出一种面向黑盒攻击的补丁随机搜索方法。具体而言,通过统筹考虑质量评分攻击效果与对抗视频视觉质量,将攻击问题建模为在恰可察觉差异(JND)约束下误导预估质量得分。基于该建模,设计了一种名为"评分反向边界损失"的新型损失函数,驱使对抗视频的预估质量得分偏离真实得分并趋近特定边界,同时将JND约束建模为严格的$L_2$与$L_\infty$范数限制。通过这种方式,可有效且隐蔽地发起白盒与黑盒攻击。源代码已开源至https://github.com/GZHU-DVL/AttackVQA。