Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

Blind video quality assessment (BVQA) is a highly challenging task due to the intrinsic complexity of video content and visual distortions, especially given the high popularity of social media videos, which originate from a wide range of sources, and are often processed by various compression and enhancement algorithms. While recent BVQA and blind image quality assessment (BIQA) studies have made remarkable progress, their models typically perform well on the datasets they were trained on but generalize poorly to unseen videos, making them less effective for accurately evaluating the perceptual quality of diverse social media videos. In this paper, we propose Rich Quality-aware features enabled Video Quality Assessment (RQ-VQA), a simple yet effective method to enhance BVQA by leveraging rich quality-aware features extracted from off-the-shelf BIQA and BVQA models. Our approach exploits the expertise of existing quality assessment models within their trained domains to improve generalization. Specifically, we design a multi-source feature framework that integrates:(1) Learnable spatial features} from a base model fine-tuned on the target VQA dataset to capture domain-specific quality cues; (2) Temporal motion features from the fast pathway of SlowFast pre-trained on action recognition datasets to model motion-related distortions; (3) Spatial quality-aware features from BIQA models trained on diverse IQA datasets to enhance frame-level distortion representation; and (4) Spatiotemporal quality-aware features from a BVQA model trained on large-scale VQA datasets to jointly encode spatial structure and temporal dynamics. These features are concatenated and fed into a multi-layer perceptron (MLP) to regress them into quality scores. Experimental results demonstrate that our model achieves state-of-the-art performance on three public social media VQA datasets.

翻译：盲视频质量评估（BVQA）是一项极具挑战性的任务，这源于视频内容与视觉失真的内在复杂性，尤其是在社交媒体视频高度普及的背景下——这些视频来源广泛，且常经过多种压缩与增强算法处理。尽管近期的BVQA与盲图像质量评估（BIQA）研究取得了显著进展，但其模型通常在训练数据集上表现良好，而对未见过的视频泛化能力较差，导致其在准确评估多样化社交媒体视频的感知质量方面效果有限。本文提出了一种基于丰富质量感知特征的视频质量评估方法（RQ-VQA），这是一种简单而有效的BVQA增强方法，通过利用从现成的BIQA和BVQA模型中提取的丰富质量感知特征来实现。我们的方法利用现有质量评估模型在其训练领域内的专业知识来提升泛化能力。具体而言，我们设计了一个多源特征框架，该框架整合了：（1）来自在目标VQA数据集上微调的基础模型的可学习空间特征，以捕获领域特定的质量线索；（2）来自在动作识别数据集上预训练的SlowFast网络快速通路的时序运动特征，以建模与运动相关的失真；（3）来自在不同IQA数据集上训练的BIQA模型的空间质量感知特征，以增强帧级失真表征；（4）来自在大规模VQA数据集上训练的BVQA模型的时空质量感知特征，以联合编码空间结构和时序动态。这些特征被拼接后输入到一个多层感知机（MLP）中，以回归得到质量分数。实验结果表明，我们的模型在三个公开的社交媒体VQA数据集上取得了最先进的性能。