Understanding and explaining the behavior of machine learning models is essential for building transparent and trustworthy AI systems. We introduce DEXTER, a data-free framework that employs diffusion models and large language models to generate global, textual explanations of visual classifiers. DEXTER operates by optimizing text prompts to synthesize class-conditional images that strongly activate a target classifier. These synthetic samples are then used to elicit detailed natural language reports that describe class-specific decision patterns and biases. Unlike prior work, DEXTER enables natural language explanation about a classifier's decision process without access to training data or ground-truth labels. We demonstrate DEXTER's flexibility across three tasks-activation maximization, slice discovery and debiasing, and bias explanation-each illustrating its ability to uncover the internal mechanisms of visual classifiers. Quantitative and qualitative evaluations, including a user study, show that DEXTER produces accurate, interpretable outputs. Experiments on ImageNet, Waterbirds, CelebA, and FairFaces confirm that DEXTER outperforms existing approaches in global model explanation and class-level bias reporting. Code is available at https://github.com/perceivelab/dexter.
翻译:理解和解释机器学习模型的行为对于构建透明且可信的人工智能系统至关重要。本文提出DEXTER,一种无需训练数据的框架,其利用扩散模型与大语言模型为视觉分类器生成全局的文本解释。DEXTER通过优化文本提示来合成能强烈激活目标分类器的类条件图像,进而利用这些合成样本生成详细的语言报告,以描述特定类别的决策模式与偏差。与现有方法不同,DEXTER能够在无需训练数据或真实标签的情况下,对分类器的决策过程进行自然语言解释。我们通过三个任务——激活最大化、切片发现与去偏、以及偏差解释——展示了DEXTER的灵活性,每个任务均体现了其揭示视觉分类器内部机制的能力。定量与定性评估(包括用户研究)表明,DEXTER能产生准确且可解释的输出。在ImageNet、Waterbirds、CelebA和FairFaces数据集上的实验证实,DEXTER在全局模型解释和类级别偏差报告方面优于现有方法。代码发布于 https://github.com/perceivelab/dexter。