Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive and still relies on substantial Mean Opinion Score (MOS) annotations. We argue that for MLLM-based IQA, the core bottleneck lies not in the quality perception capacity of MLLMs, but in MOS scale calibration. Therefore, we propose LEAF, a Label-Efficient Image Quality Assessment Framework that distills perceptual quality priors from an MLLM teacher into a lightweight student regressor, enabling MOS calibration with minimal human supervision. Specifically, the teacher conducts dense supervision through point-wise judgments and pair-wise preferences, with an estimate of decision reliability. Guided by these signals, the student learns the teacher's quality perception patterns through joint distillation and is calibrated on a small MOS subset to align with human annotations. Experiments on both user-generated and AI-generated IQA benchmarks demonstrate that our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations, making lightweight IQA practical under limited annotation budgets.
翻译:近年来,多模态大语言模型(MLLMs)在图像质量评估(IQA)任务中展现出强大能力。然而,适应此类大规模模型计算成本高昂,且仍需依赖大量平均意见分(MOS)标注。我们认为,对于基于MLLM的IQA,核心瓶颈不在于MLLM的质量感知能力,而在于MOS尺度校准。因此,我们提出LEAF,一个标签高效的图像质量评估框架,它将来自MLLM教师模型的质量感知先验知识蒸馏到一个轻量级学生回归器中,从而能以最少的人工监督实现MOS校准。具体而言,教师模型通过逐点判断和成对偏好进行密集监督,并提供决策可靠性的估计。在这些信号的引导下,学生模型通过联合蒸馏学习教师的质量感知模式,并在一个小的MOS标注子集上进行校准,以与人类标注对齐。在用户生成和AI生成的IQA基准测试上的实验表明,我们的方法显著减少了对人工标注的需求,同时保持了与MOS高度对齐的相关性,使得在有限标注预算下实现轻量级IQA成为可能。