Visual quality assessment (VQA) is increasingly shifting from scalar score prediction toward interpretable quality understanding -- a paradigm that demands \textit{fine-grained spatiotemporal perception} and \textit{auxiliary contextual information}. Current approaches rely on supervised fine-tuning or reinforcement learning on curated instruction datasets, which involve labor-intensive annotation and are prone to dataset-specific biases. To address these challenges, we propose \textbf{QualiRAG}, a \textit{training-free} \textbf{R}etrieval-\textbf{A}ugmented \textbf{G}eneration \textbf{(RAG)} framework that systematically leverages the latent perceptual knowledge of large multimodal models (LMMs) for visual quality perception. Unlike conventional RAG that retrieves from static corpora, QualiRAG dynamically generates auxiliary knowledge by decomposing questions into structured requests and constructing four complementary knowledge sources: \textit{visual metadata}, \textit{subject localization}, \textit{global quality summaries}, and \textit{local quality descriptions}, followed by relevance-aware retrieval for evidence-grounded reasoning. Extensive experiments show that QualiRAG achieves substantial improvements over open-source general-purpose LMMs and VQA-finetuned LMMs on visual quality understanding tasks, and delivers competitive performance on visual quality comparison tasks, demonstrating robust quality assessment capabilities without any task-specific training. The code will be publicly available at https://github.com/clh124/QualiRAG.
翻译:视觉质量评估(VQA)正日益从标量分数预测转向可解释的质量理解——这一范式要求具备\textit{细粒度时空感知}能力并利用\textit{辅助上下文信息}。当前方法依赖于在精心构建的指令数据集上进行监督微调或强化学习,这些方法涉及劳动密集型的标注工作,且容易受到数据集特定偏差的影响。为应对这些挑战,我们提出了\textbf{QualiRAG},一个\textit{无需训练的}\textbf{检索增强生成(RAG)}框架,它系统性地利用大型多模态模型(LMMs)的潜在感知知识进行视觉质量感知。不同于从静态语料库中检索的传统RAG,QualiRAG通过将问题分解为结构化请求并构建四个互补的知识源来动态生成辅助知识:\textit{视觉元数据}、\textit{主体定位}、\textit{全局质量摘要}和\textit{局部质量描述},随后进行相关性感知检索以支撑基于证据的推理。大量实验表明,在视觉质量理解任务上,QualiRAG相较于开源通用LMMs和经过VQA微调的LMMs取得了显著提升,并在视觉质量比较任务上展现出具有竞争力的性能,这证明了其在无需任何任务特定训练的情况下具备稳健的质量评估能力。代码将在 https://github.com/clh124/QualiRAG 公开。