Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.
翻译:盲视频质量评估(BVQA)在各类现实视频媒体应用中,对监控和提升终端用户体验发挥着不可或缺的作用。作为实验性领域,BVQA模型的改进主要基于几个人工标注的VQA数据集进行衡量。因此,为恰当评估BVQA的当前进展,深入理解现有VQA数据集至关重要。为此,我们首次通过设计简约化BVQA模型,对VQA数据集开展计算分析。所谓简约化,是指将BVQA模型家族限制为仅基于基本模块构建:视频预处理器(用于激进时空下采样)、空间质量分析器、可选的时间质量分析器以及质量回归器,所有模块均采用最简实例化方案。通过对比八种具有真实失真的VQA数据集上不同模型变体的质量预测性能,我们发现几乎所有数据集均存在不同程度的"简单数据集"问题,部分数据集甚至允许盲图像质量评估(BIQA)方案。我们进一步通过对比模型在这些VQA数据集上的泛化能力,以及消融与基本构建模块相关的令人眼花缭乱的BVQA设计选择,为上述主张提供佐证。研究结果不仅对BVQA领域的当前进展提出质疑,同时为构建下一代VQA数据集与模型提供了良好实践启示。