No-Reference Image Quality Assessment (NR-IQA) focuses on designing methods to measure image quality in alignment with human perception when a high-quality reference image is unavailable. The reliance on annotated Mean Opinion Scores (MOS) in the majority of state-of-the-art NR-IQA approaches limits their scalability and broader applicability to real-world scenarios. To overcome this limitation, we propose QualiCLIP (Quality-aware CLIP), a CLIP-based self-supervised opinion-unaware method that does not require labeled MOS. In particular, we introduce a quality-aware image-text alignment strategy to make CLIP generate representations that correlate with the inherent quality of the images. Starting from pristine images, we synthetically degrade them with increasing levels of intensity. Then, we train CLIP to rank these degraded images based on their similarity to quality-related antonym text prompts, while guaranteeing consistent representations for images with comparable quality. Our method achieves state-of-the-art performance on several datasets with authentic distortions. Moreover, despite not requiring MOS, QualiCLIP outperforms supervised methods when their training dataset differs from the testing one, thus proving to be more suitable for real-world scenarios. Furthermore, our approach demonstrates greater robustness and improved explainability than competing methods. The code and the model are publicly available at https://github.com/miccunifi/QualiCLIP.
翻译:无参考图像质量评估(NR-IQA)致力于在缺乏高质量参考图像的情况下,设计出与人类感知相一致的图像质量度量方法。现有大多数最先进的NR-IQA方法依赖标注的平均意见得分(MOS),这限制了其可扩展性及在真实场景中的广泛适用性。为克服这一局限,我们提出QualiCLIP(质量感知CLIP)——一种基于CLIP的无需真值标签的自监督无意见方法。具体而言,我们引入质量感知的图像-文本对齐策略,使CLIP能够生成与图像固有质量相关的表征。从原始图像出发,我们以递增的强度对其进行合成退化。随后,我们训练CLIP根据这些退化图像与质量相关的反义词文本提示的相似度进行排序,同时确保质量相近的图像具有一致的表征。我们的方法在多个包含真实失真的数据集上达到了最先进性能。此外,尽管无需MOS,QualiCLIP在训练数据集与测试数据集不同时仍优于监督方法,从而证明其更适用于真实场景。进一步地,我们的方法相比竞争方法展现出更强的鲁棒性和可解释性。代码与模型已公开于https://github.com/miccunifi/QualiCLIP。