Despite increasing progress in development of methods for generating visual counterfactual explanations, especially with the recent rise of Denoising Diffusion Probabilistic Models, previous works consider them as an entirely local technique. In this work, we take the first step at globalizing them. Specifically, we discover that the latent space of Diffusion Autoencoders encodes the inference process of a given classifier in the form of global directions. We propose a novel proxy-based approach that discovers two types of these directions with the use of only single image in an entirely black-box manner. Precisely, g-directions allow for flipping the decision of a given classifier on an entire dataset of images, while h-directions further increase the diversity of explanations. We refer to them in general as Global Counterfactual Directions (GCDs). Moreover, we show that GCDs can be naturally combined with Latent Integrated Gradients resulting in a new black-box attribution method, while simultaneously enhancing the understanding of counterfactual explanations. We validate our approach on existing benchmarks and show that it generalizes to real-world use-cases.
翻译:尽管视觉反事实解释方法的开发进展日益显著,尤其是在去噪扩散概率模型近期兴起的背景下,先前的研究仅将其视为纯粹的局部技术。本工作中,我们首次尝试将其全局化。具体而言,我们发现扩散自编码器的潜在空间以全局方向的形式编码了给定分类器的推理过程。我们提出了一种新颖的基于代理的方法,该方法仅使用单张图像并以完全黑盒的方式发现两类此类方向。精确而言,g-方向允许在完整图像数据集上翻转给定分类器的决策,而h-方向则进一步增强了解释的多样性。我们将其统称为全局反事实方向。此外,我们证明GCDs可与潜在积分梯度自然结合,形成一种新的黑盒归因方法,同时深化对反事实解释的理解。我们在现有基准上验证了所提方法,并证明其可推广至实际应用场景。