Image Quality Assessment (IQA) is a challenging task that requires training on massive datasets to achieve accurate predictions. However, due to the lack of IQA data, deep learning-based IQA methods typically rely on pre-trained networks trained on massive datasets as feature extractors to enhance their generalization ability, such as the ResNet network trained on ImageNet. In this paper, we utilize the encoder of Segment Anything, a recently proposed segmentation model trained on a massive dataset, for high-level semantic feature extraction. Most IQA methods are limited to extracting spatial-domain features, while frequency-domain features have been shown to better represent noise and blur. Therefore, we leverage both spatial-domain and frequency-domain features by applying Fourier and standard convolutions on the extracted features, respectively. Extensive experiments are conducted to demonstrate the effectiveness of all the proposed components, and results show that our approach outperforms the state-of-the-art (SOTA) in four representative datasets, both qualitatively and quantitatively. Our experiments confirm the powerful feature extraction capabilities of Segment Anything and highlight the value of combining spatial-domain and frequency-domain features in IQA tasks. Code: https://github.com/Hedlen/SAM-IQA
翻译:图像质量评估(IQA)是一项具有挑战性的任务,需要在大规模数据集上训练才能实现准确预测。然而,由于IQA数据的缺乏,基于深度学习的IQA方法通常依赖预训练网络作为特征提取器来增强泛化能力,例如在ImageNet上训练的ResNet网络。本文利用近期提出的基于大规模数据集训练的分割模型"Segment Anything"的编码器进行高层语义特征提取。多数IQA方法局限于提取空间域特征,而频域特征已被证明能更好地表征噪声和模糊。因此,我们通过分别对提取的特征应用傅里叶变换和标准卷积,同时利用空间域和频域特征。大量实验证明了所有提出组件的有效性,结果表明,我们的方法在四个代表性数据集上无论定性还是定量均优于现有最先进方法(SOTA)。我们的实验证实了Segment Anything强大的特征提取能力,并凸显了在IQA任务中结合空间域与频域特征的价值。代码:https://github.com/Hedlen/SAM-IQA