Tool-IQA: Augmenting Image Quality Assessment with Simple Tools

Vision-Language Models (VLMs) have been increasingly adopted for Image Quality Assessment (IQA). However, current methods typically employ a static one-shot scoring paradigm, despite the fact that humans assess image quality through dynamic visual inspection, e.g., selectively adjusting views to verify details and subtle artifacts. Specifically, relying solely on a single-pass observation introduces two primary limitations: first, perceiving the image only at a global scale restricts the assessment of finer local details; second, the original intensity distribution of the image may overwhelm the visibility, leading to insufficient inspection of image quality. To address these issues, we propose Tool-IQA, shifting the assessment mechanism from passive scoring to a tool-augmented workflow. In particular, we equip VLMs with simple yet effective view tools: a Magnifier to inspect local details, and a Gamma Corrector to uncover visibility and hidden artifacts. The assessment follows a structured pipeline that consists of an initial observation with rubric notes, a tool-augmented in-depth inspection, and a final quantification for calibrated quality score. Furthermore, to ensure efficient and purposeful tool callings, we introduce a batch-aware training strategy to reward tool interactions that can yield positive contributions rather than simply encouraging usage. Experiments on a variety of IQA benchmarks demonstrate that, with effective tool calling and calibrated assessment, our proposed Tool-IQA significantly outperforms existing state-of-the-art models, e.g., it achieves a PLCC of 0.854 on the challenging CLIVE dataset.

翻译：视觉语言模型（VLM）正越来越多地被应用于图像质量评估（IQA）。然而，当前方法通常采用静态的单次评分范式，而人类评估图像质量的过程则涉及动态视觉检查，例如有选择地调整视角以验证细节和细微伪影。具体而言，仅依赖单次观察存在两个主要限制：首先，仅在全局尺度上感知图像会限制对更精细局部细节的评估；其次，图像的原始强度分布可能掩盖可见性，导致对图像质量的检查不充分。为解决这些问题，我们提出Tool-IQA，将评估机制从被动评分转变为工具增强的工作流。特别地，我们为VLM配备了简单而有效的视图工具：用于检查局部细节的放大镜（Magnifier），以及用于揭示可见性和隐藏伪影的伽马校正器（Gamma Corrector）。评估过程遵循一个结构化流程，包括：携带评分说明的初步观察、工具增强的深入检查，以及最终校准质量得分的量化。此外，为了确保工具调用高效且具有目的性，我们引入了一种批量感知训练策略，用于奖励能够产生积极贡献（而非仅仅鼓励使用）的工具交互。在多个IQA基准上的实验表明，通过有效的工具调用和校准评估，我们提出的Tool-IQA显著优于现有最先进模型，例如在具有挑战性的CLIVE数据集上达到了0.854的PLCC值。