Deep networks have demonstrated promising results in the field of Image Quality Assessment (IQA). However, there has been limited research on understanding how deep models in IQA work. This study introduces a novel positional masked transformer for IQA and provides insights into the contribution of different regions of an image towards its overall quality. Results indicate that half of an image may play a trivial role in determining image quality, while the other half is critical. This observation is extended to several other CNN-based IQA models, revealing that half of the image regions can significantly impact the overall image quality. To further enhance our understanding, three semantic measures (saliency, frequency, and objectness) were derived and found to have high correlation with the importance of image regions in IQA.
翻译:深度网络在图像质量评估(IQA)领域已展现出令人瞩目的成果,然而关于深度学习模型在IQA中工作机制的研究仍较为有限。本研究提出了一种新颖的基于位置掩码的Transformer用于IQA,并深入揭示了图像不同区域对整体质量评估的贡献度。结果表明,图像中一半区域对质量判定可能仅起次要作用,而另一半区域则至关重要。这一现象可推广至其他多个基于CNN的IQA模型,证实半数图像区域能够对整体图像质量产生显著影响。为深化理解,我们进一步推导了三种语义度量(显著性、频率和物体性),发现它们与IQA中图像区域重要性呈高度相关性。