No-reference (NR) image quality assessment (IQA) is an important tool in enhancing the user experience in diverse visual applications. A major drawback of state-of-the-art NR-IQA techniques is their reliance on a large number of human annotations to train models for a target IQA application. To mitigate this requirement, there is a need for unsupervised learning of generalizable quality representations that capture diverse distortions. We enable the learning of low-level quality features agnostic to distortion types by introducing a novel quality-aware contrastive loss. Further, we leverage the generalizability of vision-language models by fine-tuning one such model to extract high-level image quality information through relevant text prompts. The two sets of features are combined to effectively predict quality by training a simple regressor with very few samples on a target dataset. Additionally, we design zero-shot quality predictions from both pathways in a completely blind setting. Our experiments on diverse datasets encompassing various distortions show the generalizability of the features and their superior performance in the data-efficient and zero-shot settings. Code will be made available at https://github.com/suhas-srinath/GRepQ.
翻译:无参考图像质量评估是提升多种视觉应用中用户体验的重要工具。现有最先进无参考图像质量评估技术的一个主要缺陷在于,它们依赖大量人工标注来为目标应用训练模型。为缓解这一需求,需要学习能捕捉多种失真的可泛化质量表示的无监督方法。我们通过引入一种新颖的质量感知对比损失,实现了对失真类型无关的低层质量特征的学习。此外,我们利用视觉语言模型的泛化能力,通过相关文本提示微调此类模型以提取高层图像质量信息。通过结合这两类特征,并在目标数据集上使用极少量样本训练简单回归器,即可有效预测质量。同时,我们设计了完全盲设置下基于两种路径的零样本质量预测。针对涵盖多种失真的不同数据集进行的实验表明,这些特征具有泛化能力,且在数据高效与零样本设置下表现卓越。代码将发布于 https://github.com/suhas-srinath/GRepQ。