No-Reference Video Quality Assessment (NR-VQA) plays an essential role in improving the viewing experience of end-users. Driven by deep learning, recent NR-VQA models based on Convolutional Neural Networks (CNNs) and Transformers have achieved outstanding performance. To build a reliable and practical assessment system, it is of great necessity to evaluate their robustness. However, such issue has received little attention in the academic community. In this paper, we make the first attempt to evaluate the robustness of NR-VQA models against adversarial attacks, and propose a patch-based random search method for black-box attack. Specifically, considering both the attack effect on quality score and the visual quality of adversarial video, the attack problem is formulated as misleading the estimated quality score under the constraint of just-noticeable difference (JND). Built upon such formulation, a novel loss function called Score-Reversed Boundary Loss is designed to push the adversarial video's estimated quality score far away from its ground-truth score towards a specific boundary, and the JND constraint is modeled as a strict $L_2$ and $L_\infty$ norm restriction. By this means, both white-box and black-box attacks can be launched in an effective and imperceptible manner. The source code is available at https://github.com/GZHU-DVL/AttackVQA.
翻译:无参考视频质量评估(NR-VQA)在提升终端用户观看体验中扮演着关键角色。在深度学习的驱动下,基于卷积神经网络(CNN)和Transformer的最新NR-VQA模型已取得卓越性能。为构建可靠实用的评估系统,亟需评估其鲁棒性,然而学术界对此问题的关注仍显不足。本文首次尝试评估NR-VQA模型面对对抗攻击的鲁棒性,并提出一种基于补丁的随机搜索黑盒攻击方法。具体而言,综合考虑质量分数攻击效果与对抗视频的视觉质量,将攻击问题形式化为在恰可察觉差异(JND)约束下误导估计质量分数。基于此形式化,设计了一种名为"分数反向边界损失"的新型损失函数,驱使对抗视频的估计质量分数远离其真实分数并朝向特定边界,同时将JND约束建模为严格的$L_2$和$L_\infty$范数限制。通过这种方式,可有效且难察觉地发起白盒与黑盒攻击。源代码已公开于https://github.com/GZHU-DVL/AttackVQA。