GIFT: A Framework Towards Global Interpretable Faithful Textual Explanations of Vision Classifiers

Understanding the decision processes of deep vision models is essential for their safe and trustworthy deployment in real-world settings. Existing explainability approaches, such as saliency maps or concept-based analyses, often suffer from limited faithfulness, local scope, or ambiguous semantics. We introduce GIFT, a post-hoc framework that aims to derive Global, Interpretable, Faithful, and Textual explanations for vision classifiers. GIFT begins by generating a large set of faithful, local visual counterfactuals, then employs vision-language models to translate these counterfactuals into natural-language descriptions of visual changes. These local explanations are aggregated by a large language model into concise, human-readable hypotheses about the model's global decision rules. Crucially, GIFT includes a verification stage that quantitatively assesses the causal effect of each proposed explanation by performing image-based interventions, ensuring that the final textual explanations remain faithful to the model's true reasoning process. Across diverse datasets, including the synthetic CLEVR benchmark, the real-world CelebA faces, and the complex BDD driving scenes, GIFT reveals not only meaningful classification rules but also unexpected biases and latent concepts driving model behavior. Altogether, GIFT bridges the gap between local counterfactual reasoning and global interpretability, offering a principled approach to causally grounded textual explanations for vision models.

翻译：理解深度视觉模型的决策过程对于其在现实世界场景中的安全可信部署至关重要。现有的可解释性方法，如显著图或基于概念的分析，通常存在可信度有限、范围局部或语义模糊等问题。我们提出了GIFT，一种旨在为视觉分类器生成全局、可解释、可信且文本化解释的事后分析框架。GIFT首先生成大量可信的局部视觉反事实样本，然后利用视觉-语言模型将这些反事实样本转化为描述视觉变化的自然语言。这些局部解释通过一个大语言模型聚合成关于模型全局决策规则的简洁、人类可读的假设。至关重要的是，GIFT包含一个验证阶段，通过执行基于图像的干预来定量评估每个所提解释的因果效应，确保最终的文本解释忠实于模型真实的推理过程。在包括合成基准CLEVR、真实世界人脸数据集CelebA以及复杂驾驶场景数据集BDD在内的多样化数据集上，GIFT不仅揭示了有意义的分类规则，还发现了驱动模型行为的意外偏见和潜在概念。总体而言，GIFT弥合了局部反事实推理与全局可解释性之间的差距，为视觉模型提供了一种基于因果关系的、原则性的文本解释方法。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

可解释人工智能的基础

专知会员服务

32+阅读 · 2025年10月26日

视觉识别中的可解释性综述

专知会员服务

23+阅读 · 2025年7月17日

【博士论文】解释大型视觉模型方面的进展

专知会员服务

27+阅读 · 2025年2月7日

深度学习可解释进展到哪了？CIKM2022教程《深度学习可解释：数据视角》，194页ppt

专知会员服务

81+阅读 · 2022年11月23日