Generating counterfactual explanations is one of the most effective approaches for uncovering the inner workings of black-box neural network models and building user trust. While remarkable strides have been made in generative modeling using diffusion models in domains like vision, their utility in generating counterfactual explanations in structured modalities remains unexplored. In this paper, we introduce Structured Counterfactual Diffuser or SCD, the first plug-and-play framework leveraging diffusion for generating counterfactual explanations in structured data. SCD learns the underlying data distribution via a diffusion model which is then guided at test time to generate counterfactuals for any arbitrary black-box model, input, and desired prediction. Our experiments show that our counterfactuals not only exhibit high plausibility compared to the existing state-of-the-art but also show significantly better proximity and diversity.
翻译:生成反事实解释是揭示黑箱神经网络模型内部机制并建立用户信任的最有效方法之一。尽管扩散模型在视觉等领域的生成式建模中取得了显著进展,但其在结构化模态中生成反事实解释的效用仍未得到探索。本文提出结构化反事实扩散器(Structured Counterfactual Diffuser,简称SCD),这是首个利用扩散在结构化数据中生成反事实解释的即插即用框架。SCD通过扩散模型学习底层数据分布,并在测试阶段引导该模型为任意黑箱模型、输入和期望预测生成反事实。实验表明,与现有最先进方法相比,我们生成的反事实不仅具有更高的合理性,而且在邻近性和多样性方面表现出显著优势。