Latent Diffusion Counterfactual Explanations

Counterfactual explanations have emerged as a promising method for elucidating the behavior of opaque black-box models. Recently, several works leveraged pixel-space diffusion models for counterfactual generation. To handle noisy, adversarial gradients during counterfactual generation -- causing unrealistic artifacts or mere adversarial perturbations -- they required either auxiliary adversarially robust models or computationally intensive guidance schemes. However, such requirements limit their applicability, e.g., in scenarios with restricted access to the model's training data. To address these limitations, we introduce Latent Diffusion Counterfactual Explanations (LDCE). LDCE harnesses the capabilities of recent class- or text-conditional foundation latent diffusion models to expedite counterfactual generation and focus on the important, semantic parts of the data. Furthermore, we propose a novel consensus guidance mechanism to filter out noisy, adversarial gradients that are misaligned with the diffusion model's implicit classifier. We demonstrate the versatility of LDCE across a wide spectrum of models trained on diverse datasets with different learning paradigms. Finally, we showcase how LDCE can provide insights into model errors, enhancing our understanding of black-box model behavior.

翻译：反事实解释已成为阐明不透明黑箱模型行为的一种有前景的方法。近期，若干工作利用像素空间扩散模型生成反事实。为了处理反事实生成过程中噪声和对抗性梯度——这些梯度会导致不真实的伪影或仅仅是对抗性扰动——它们要么需要辅助的对抗鲁棒模型，要么需要计算密集的引导方案。然而，这些要求限制了它们的适用性，例如在无法完全访问模型训练数据的场景中。为了解决这些限制，我们提出了潜在扩散反事实解释（LDCE）。LDCE 利用最新的类条件或文本条件基础潜在扩散模型的能力，以加速反事实生成并聚焦于数据中重要的语义部分。此外，我们提出了一种新颖的共识引导机制，用于过滤与扩散模型隐式分类器不一致的噪声和对抗性梯度。我们展示了 LDCE 在多种数据集上以不同学习范式训练的各类模型中的广泛适用性。最后，我们展示了 LDCE 如何能够提供对模型错误的洞察，从而增强我们对黑箱模型行为的理解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日