Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy

Deep Video Quality Assessment (VQA) methods have shown impressive high-performance capabilities. Notably, no-reference (NR) VQA methods play a vital role in situations where obtaining reference videos is restricted or not feasible. Nevertheless, as more streaming videos are being created in ultra-high definition (e.g., 4K) to enrich viewers' experiences, the current deep VQA methods face unacceptable computational costs. Furthermore, the resizing, cropping, and local sampling techniques employed in these methods can compromise the details and content of original 4K videos, thereby negatively impacting quality assessment. In this paper, we propose a highly efficient and novel NR 4K VQA technology. Specifically, first, a novel data sampling and training strategy is proposed to tackle the problem of excessive resolution. This strategy allows the VQA Swin Transformer-based model to effectively train and make inferences using the full data of 4K videos on standard consumer-grade GPUs without compromising content or details. Second, a weighting and scoring scheme is developed to mimic the human subjective perception mode, which is achieved by considering the distinct impact of each sub-region within a 4K frame on the overall perception. Third, we incorporate the frequency domain information of video frames to better capture the details that affect video quality, consequently further improving the model's generalizability. To our knowledge, this is the first technology for the NR 4K VQA task. Thorough empirical studies demonstrate it not only significantly outperforms existing methods on a specialized 4K VQA dataset but also achieves state-of-the-art performance across multiple open-source NR video quality datasets.

翻译：深度视频质量评估方法已展现出卓越的高性能表现。值得注意的是，无参考视频质量评估方法在参考视频获取受限或不可行的情况下具有关键作用。然而，随着为提升观看体验而制作的超高清视频日益增多，现有深度视频质量评估方法面临难以承受的计算成本。此外，这些方法采用的缩放、裁剪和局部采样技术可能损害原始4K视频的细节与内容，从而对质量评估产生负面影响。本文提出一种高效新颖的无参考4K视频质量评估技术。具体而言：首先，针对分辨率过高问题提出创新的数据采样与训练策略，该策略使基于Swin Transformer的视频质量评估模型能在标准消费级GPU上完整利用4K视频数据进行训练与推理，同时保持内容与细节完整性。其次，通过量化4K帧内各子区域对整体感知的差异化影响，设计权重分配与评分机制以模拟人类主观感知模式。第三，引入视频帧频域信息以更有效捕捉影响视频质量的细节特征，从而进一步提升模型的泛化能力。据我们所知，这是首个面向无参考4K视频质量评估任务的技术。详尽的实证研究表明，该方法不仅在专业4K视频质量评估数据集上显著优于现有方法，还在多个开源无参考视频质量数据集中实现了最先进的性能表现。