Improving vision-language models (VLMs) in the post-training stage typically relies on supervised fine-tuning or reinforcement learning, methods that necessitate costly, human-annotated data. While self-supervised techniques have proven effective for enhancing reasoning capabilities, their application to perceptual domains such as image quality assessment (IQA) remains largely unexplored. In this work, we introduce EvoQuality, a novel framework that enables a VLM to autonomously refine its quality perception capabilities without any ground-truth labels. EvoQuality adapts the principle of self-consistency to the ranking-based nature of IQA. It generates pseudo-labels by performing pairwise majority voting on the VLM's own outputs to establish a consensus on relative quality. These pseudo-rankings are then formulated into a fidelity reward that guides the model's iterative evolution through group relative policy optimization (GRPO). By iteratively leveraging its own predictions, EvoQuality progressively refines the VLM's perceptual capability. Extensive experiments show that EvoQuality boosts the base VLM's zero-shot performance by 31.8% on PLCC across diverse IQA benchmarks. Remarkably, despite being entirely self-supervised, EvoQuality achieves performance that is competitive with, or even surpasses, state-of-the-art supervised VLM-based IQA models, outperforming these models on 5 out of 7 IQA benchmarks. Furthermore, the framework demonstrates significant flexibility, allowing it to be stacked with pre-trained IQA models to bolster generalization on unseen datasets. Codes and checkpoints will be available at https://github.com/bytedance/EvoQuality.
翻译:在视觉语言模型(VLM)的后训练阶段提升其性能通常依赖于监督微调或强化学习方法,这些方法需要昂贵的人工标注数据。尽管自监督技术已被证明能有效增强推理能力,但其在图像质量评估(IQA)等感知领域的应用仍鲜有探索。本文提出EvoQuality——一种新颖的框架,使VLM能够在无需任何真实标签的情况下自主优化其质量感知能力。EvoQuality将自一致性原理适配至IQA基于排序的特性,通过对其自身输出进行成对多数投票来建立关于相对质量的共识,从而生成伪标签。这些伪排序被进一步转化为保真度奖励,通过群体相对策略优化(GRPO)引导模型的迭代演化。通过重复利用自身预测结果,EvoQuality逐步精炼VLM的感知能力。大量实验表明,在多个IQA基准测试中,EvoQuality将基础VLM的零样本PLCC性能提升了31.8%。值得注意的是,尽管完全基于自监督方式,EvoQuality仍能达到甚至超越当前最先进的监督式VLM-IQA模型性能——在7个IQA基准测试中的5个上优于这些模型。此外,该框架展现出显著灵活性,可与预训练IQA模型堆叠使用以增强其在未见数据集上的泛化能力。代码与模型权重将发布于 https://github.com/bytedance/EvoQuality。