Automatic perception of image quality is a challenging problem that impacts billions of Internet and social media users daily. To advance research in this field, we propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model. The proposed Cross-IQA method can learn image quality features from unlabeled image data. We construct the pretext task of synthesized image reconstruction to unsupervised extract the image quality information based ViT block. The pretrained encoder of Cross-IQA is used to fine-tune a linear regression model for score prediction. Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information (e.g., color change, blurring, etc.) of images compared with the classical full-reference IQA and NR-IQA under the same datasets.
翻译:图像质量的自动感知是一个具有挑战性的问题,影响着每天数十亿互联网和社交媒体用户。为推进该领域研究,我们提出了一种基于视觉Transformer(ViT)模型的无参考图像质量评估(NR-IQA)方法,称为Cross-IQA。所提出的Cross-IQA方法能够从未标记图像数据中学习图像质量特征。我们构建了合成图像重建的预文本任务,以基于ViT模块无监督地提取图像质量信息。利用Cross-IQA的预训练编码器微调线性回归模型进行分数预测。实验结果表明,在相同数据集下,与经典的全参考IQA和NR-IQA方法相比,Cross-IQA在评估图像低频退化信息(如颜色变化、模糊等)方面可达到最先进的性能水平。