Understanding the decision processes of deep vision models is essential for their safe and trustworthy deployment in real-world settings. Existing explainability approaches, such as saliency maps or concept-based analyses, often suffer from limited faithfulness, local scope, or ambiguous semantics. We introduce GIFT, a post-hoc framework that aims to derive Global, Interpretable, Faithful, and Textual explanations for vision classifiers. GIFT begins by generating a large set of faithful, local visual counterfactuals, then employs vision-language models to translate these counterfactuals into natural-language descriptions of visual changes. These local explanations are aggregated by a large language model into concise, human-readable hypotheses about the model's global decision rules. Crucially, GIFT includes a verification stage that quantitatively assesses the causal effect of each proposed explanation by performing image-based interventions, ensuring that the final textual explanations remain faithful to the model's true reasoning process. Across diverse datasets, including the synthetic CLEVR benchmark, the real-world CelebA faces, and the complex BDD driving scenes, GIFT reveals not only meaningful classification rules but also unexpected biases and latent concepts driving model behavior. Altogether, GIFT bridges the gap between local counterfactual reasoning and global interpretability, offering a principled approach to causally grounded textual explanations for vision models.
翻译:理解深度视觉模型的决策过程对于其在现实世界场景中的安全可信部署至关重要。现有的可解释性方法,如显著图或基于概念的分析,通常存在可信度有限、范围局部或语义模糊等问题。我们提出了GIFT,一种旨在为视觉分类器生成全局、可解释、可信且文本化解释的事后分析框架。GIFT首先生成大量可信的局部视觉反事实样本,然后利用视觉-语言模型将这些反事实样本转化为描述视觉变化的自然语言。这些局部解释通过一个大语言模型聚合成关于模型全局决策规则的简洁、人类可读的假设。至关重要的是,GIFT包含一个验证阶段,通过执行基于图像的干预来定量评估每个所提解释的因果效应,确保最终的文本解释忠实于模型真实的推理过程。在包括合成基准CLEVR、真实世界人脸数据集CelebA以及复杂驾驶场景数据集BDD在内的多样化数据集上,GIFT不仅揭示了有意义的分类规则,还发现了驱动模型行为的意外偏见和潜在概念。总体而言,GIFT弥合了局部反事实推理与全局可解释性之间的差距,为视觉模型提供了一种基于因果关系的、原则性的文本解释方法。