Beyond MOS: Subjective Image Quality Score Preprocessing Method Based on Perceptual Similarity

Image quality assessment often relies on raw opinion scores provided by subjects in subjective experiments, which can be noisy and unreliable. To address this issue, postprocessing procedures such as ITU-R BT.500, ITU-T P.910, and ITU-T P.913 have been standardized to clean up the original opinion scores. These methods use annotator-based statistical priors, but they do not take into account extensive information about the image itself, which limits their performance in less annotated scenarios. Generally speaking, image quality datasets usually contain similar scenes or distortions, and it is inevitable for subjects to compare images to score a reasonable score when scoring. Therefore, In this paper, we proposed Subjective Image Quality Score Preprocessing Method perceptual similarity Subjective Preprocessing (PSP), which exploit the perceptual similarity between images to alleviate subjective bias in less annotated scenarios. Specifically, we model subjective scoring as a conditional probability model based on perceptual similarity with previously scored images, called subconscious reference scoring. The reference images are stored by a neighbor dictionary, which is obtained by a normalized vector dot-product based nearest neighbor search of the images' perceptual depth features. Then the preprocessed score is updated by the exponential moving average (EMA) of the subconscious reference scoring, called similarity regularized EMA. Our experiments on multiple datasets (LIVE, TID2013, CID2013) show that this method can effectively remove the bias of the subjective scores. Additionally, Experiments prove that the Preprocesed dataset can improve the performance of downstream IQA tasks very well.

翻译：图像质量评估通常依赖于主观实验中受试者提供的原始意见分数，这些分数可能包含噪声且可靠性较低。为解决这一问题，ITU-R BT.500、ITU-T P.910和ITU-T P.910等标准制定了后处理流程以清理原始意见分数。这些方法基于标注者的统计先验信息，但未充分利用图像本身的丰富信息，从而在标注较少的场景下限制了其性能。通常，图像质量数据集往往包含相似场景或失真，受试者在评分时不可避免会通过比较图像来给出合理分数。因此，本文提出了一种基于感知相似性的主观图像质量分数预处理方法PSP（感知相似性主观预处理），通过利用图像间的感知相似性来缓解标注较少场景下的主观偏差。具体而言，我们将主观评分建模为基于与已评分图像感知相似性的条件概率模型，称为潜意识参考评分。参考图像通过邻居字典存储，该字典通过基于归一化向量点积的近邻搜索获取图像的感知深度特征。随后，利用潜意识参考评分的指数移动平均更新预处理分数，称为相似性正则化EMA。在多个数据集（LIVE、TID2013、CID2013）上的实验表明，该方法能有效消除主观评分的偏差。此外，实验证明预处理后的数据集能显著提升下游图像质量评估任务的性能。