With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be used. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the diversity of video content and the assessment of its quality without reference videos. ReLaX-VQA uses fragments of residual frames and optical flow, along with different expressions of spatial features of the sampled frames, to enhance motion and spatial perception. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features (from Residual Networks and Vision Transformers). Extensive testing on four UGC datasets confirms that ReLaX-VQA outperforms existing NR-VQA methods with an average SRCC value of 0.8658 and PLCC value of 0.8872. We will open source the code and trained models to facilitate further research and applications of NR-VQA: https://github.com/xinyiW915/ReLaX-VQA.
翻译:随着用户与分享平台间用户生成内容(UGC)的快速增长,对真实场景视频质量评估的需求日益凸显。UGC大多通过消费级设备采集,并在到达终端用户前经历多轮压缩或转码。因此,需要原始内容作为参考的传统质量度量方法已无法适用。本文提出ReLaX-VQA,一种新颖的无参考视频质量评估(NR-VQA)模型,旨在应对无参考视频条件下评估视频内容多样性及其质量的挑战。ReLaX-VQA利用残差帧与光流片段,结合采样帧空间特征的不同表达形式,以增强运动与空间感知能力。此外,该模型通过在深度神经网络特征(来自残差网络与视觉Transformer)中采用层级堆叠技术来增强特征抽象能力。在四个UGC数据集上的大量测试表明,ReLaX-VQA以平均SRCC值0.8658和PLCC值0.8872的表现优于现有NR-VQA方法。我们将开源代码与训练模型以促进NR-VQA的进一步研究与应用:https://github.com/xinyiW915/ReLaX-VQA。