Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. The rapid advancement of multi-modal large language models (MLLMs) has significantly broadened the scope of IQA, moving toward comprehensive image quality understanding that incorporates content analysis, degradation perception, and comparison reasoning beyond mere numerical scoring. Previous MLLM-based methods typically either generate numerical scores lacking interpretability or heavily rely on supervised fine-tuning (SFT) using large-scale annotated datasets to provide descriptive assessments, limiting their flexibility and applicability. In this paper, we propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels. By jointly optimizing score regression and degradation perception tasks with carefully designed reward functions, our approach effectively exploits their mutual benefits for enhanced performance. Extensive experiments demonstrate that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks, while exhibiting impressive zero-shot generalization to comparison reasoning tasks. Code will be available at https://github.com/lwq20020127/Q-Insight.
翻译:图像质量评估(IQA)关注图像的感知视觉质量,在图像重建、压缩和生成等下游任务中发挥着关键作用。多模态大语言模型(MLLMs)的快速发展显著拓宽了IQA的研究范畴,使其从单纯的数值评分转向融合内容分析、退化感知和比较推理的全面图像质量理解。以往基于MLLM的方法通常要么生成缺乏可解释性的数值分数,要么严重依赖使用大规模标注数据集进行监督微调(SFT)以提供描述性评估,这限制了其灵活性和适用性。本文提出Q-Insight,一种基于强化学习的模型,构建于分组相对策略优化(GRPO)之上,该模型展现出强大的视觉推理能力以理解图像质量,同时仅需少量评分分数和退化标签。通过精心设计的奖励函数联合优化分数回归和退化感知任务,我们的方法有效利用了二者间的相互促进作用以提升性能。大量实验表明,Q-Insight在分数回归和退化感知任务上均显著优于现有的最先进方法,同时在比较推理任务上展现出令人印象深刻的零样本泛化能力。代码将在 https://github.com/lwq20020127/Q-Insight 提供。