Vision-Language Models (VLMs) have been increasingly adopted for Image Quality Assessment (IQA). However, current methods typically employ a static one-shot scoring paradigm, despite the fact that humans assess image quality through dynamic visual inspection, e.g., selectively adjusting views to verify details and subtle artifacts. Specifically, relying solely on a single-pass observation introduces two primary limitations: first, perceiving the image only at a global scale restricts the assessment of finer local details; second, the original intensity distribution of the image may overwhelm the visibility, leading to insufficient inspection of image quality. To address these issues, we propose Tool-IQA, shifting the assessment mechanism from passive scoring to a tool-augmented workflow. In particular, we equip VLMs with simple yet effective view tools: a Magnifier to inspect local details, and a Gamma Corrector to uncover visibility and hidden artifacts. The assessment follows a structured pipeline that consists of an initial observation with rubric notes, a tool-augmented in-depth inspection, and a final quantification for calibrated quality score. Furthermore, to ensure efficient and purposeful tool callings, we introduce a batch-aware training strategy to reward tool interactions that can yield positive contributions rather than simply encouraging usage. Experiments on a variety of IQA benchmarks demonstrate that, with effective tool calling and calibrated assessment, our proposed Tool-IQA significantly outperforms existing state-of-the-art models, e.g., it achieves a PLCC of 0.854 on the challenging CLIVE dataset.
翻译:视觉语言模型(VLM)正越来越多地被应用于图像质量评估(IQA)。然而,当前方法通常采用静态的单次评分范式,而人类评估图像质量的过程则涉及动态视觉检查,例如有选择地调整视角以验证细节和细微伪影。具体而言,仅依赖单次观察存在两个主要限制:首先,仅在全局尺度上感知图像会限制对更精细局部细节的评估;其次,图像的原始强度分布可能掩盖可见性,导致对图像质量的检查不充分。为解决这些问题,我们提出Tool-IQA,将评估机制从被动评分转变为工具增强的工作流。特别地,我们为VLM配备了简单而有效的视图工具:用于检查局部细节的放大镜(Magnifier),以及用于揭示可见性和隐藏伪影的伽马校正器(Gamma Corrector)。评估过程遵循一个结构化流程,包括:携带评分说明的初步观察、工具增强的深入检查,以及最终校准质量得分的量化。此外,为了确保工具调用高效且具有目的性,我们引入了一种批量感知训练策略,用于奖励能够产生积极贡献(而非仅仅鼓励使用)的工具交互。在多个IQA基准上的实验表明,通过有效的工具调用和校准评估,我们提出的Tool-IQA显著优于现有最先进模型,例如在具有挑战性的CLIVE数据集上达到了0.854的PLCC值。