AIGC images are prevalent across various fields, yet they frequently suffer from quality issues like artifacts and unnatural textures. Specialized models aim to predict defect region heatmaps but face two primary challenges: (1) lack of explainability, failing to provide reasons and analyses for subtle defects, and (2) inability to leverage common sense and logical reasoning, leading to poor generalization. Multimodal large language models (MLLMs) promise better comprehension and reasoning but face their own challenges: (1) difficulty in fine-grained defect localization due to the limitations in capturing tiny details; and (2) constraints in providing pixel-wise outputs necessary for precise heatmap generation. To address these challenges, we propose HEIE: a novel MLLM-Based Hierarchical Explainable image Implausibility Evaluator. We introduce the CoT-Driven Explainable Trinity Evaluator, which integrates heatmaps, scores, and explanation outputs, using CoT to decompose complex tasks into subtasks of increasing difficulty and enhance interpretability. Our Adaptive Hierarchical Implausibility Mapper synergizes low-level image features with high-level mapper tokens from LLMs, enabling precise local-to-global hierarchical heatmap predictions through an uncertainty-based adaptive token approach. Moreover, we propose a new dataset: Expl-AIGI-Eval, designed to facilitate interpretable implausibility evaluation of AIGC images. Our method demonstrates state-of-the-art performance through extensive experiments.
翻译:AIGC图像在各领域广泛应用,但常存在伪影、非自然纹理等质量问题。现有专用模型旨在预测缺陷区域热力图,但面临两大挑战:(1)缺乏可解释性,无法对细微缺陷提供原因分析;(2)难以利用常识与逻辑推理,导致泛化能力不足。多模态大语言模型虽具备更强理解与推理能力,但同样存在局限:(1)因细节捕捉能力受限,难以实现细粒度缺陷定位;(2)无法提供像素级输出以生成精确热力图。为解决上述问题,本文提出HEIE:一种基于多模态大语言模型的新型分层可解释图像不合理性评估器。我们设计了思维链驱动的可解释三元评估器,通过思维链将复杂任务分解为难度递增的子任务,整合热力图、评分与解释输出以增强可解释性。自适应分层不合理性映射器将底层图像特征与大语言模型的高层映射标记相融合,通过基于不确定性的自适应标记方法实现从局部到全局的精确分层热力图预测。此外,我们构建了新数据集Expl-AIGI-Eval,专门用于支持AIGC图像的可解释不合理性评估。大量实验表明,本方法取得了最先进的性能表现。