Deep networks show promising performance in image quality assessment (IQA), whereas few studies have investigated how a deep model works. In this work, a positional masked transformer for IQA is first developed, based on which we observe that half of an image might contribute trivially to image quality, whereas the other half is crucial. Such observation is generalized to that half of the image regions can dominate image quality in several CNN-based IQA models. Motivated by this observation, three semantic measures (saliency, frequency, objectness) are then derived, showing high accordance with importance degree of image regions in IQA.
翻译:深度网络在图像质量评估(IQA)中展现出良好的性能,但很少有研究探究深度模型的工作机制。本研究首先提出了一种面向IQA的位置掩码Transformer,基于该模型我们发现图像的一半可能对图像质量贡献甚微,而另一半则至关重要。这一观察结果可推广至多个基于CNN的IQA模型:图像的一半区域能够主导图像质量。受此启发,我们进一步推导出三种语义度量(显著性、频率、目标性),这些度量与图像区域在IQA中的重要程度具有高度一致性。