The adoption of increasingly complex deep models has fueled an urgent need for insight into how these models make predictions. Counterfactual explanations form a powerful tool for providing actionable explanations to practitioners. Previously, counterfactual explanation methods have been designed by traversing the latent space of generative models. Yet, these latent spaces are usually greatly simplified, with most of the data distribution complexity contained in the decoder rather than the latent embedding. Thus, traversing the latent space naively without taking the nonlinear decoder into account can lead to unnatural counterfactual trajectories. We introduce counterfactual explanations obtained using a Riemannian metric pulled back via the decoder and the classifier under scrutiny. This metric encodes information about the complex geometric structure of the data and the learned representation, enabling us to obtain robust counterfactual trajectories with high fidelity, as demonstrated by our experiments in real-world tabular datasets.
翻译:随着日益复杂的深度模型被广泛采用,人们迫切需要理解这些模型如何做出预测。反事实解释为实践者提供了具有可操作性的强大解释工具。以往的反事实解释方法主要通过遍历生成模型的潜空间来实现。然而,这些潜空间通常被过度简化,数据分布的复杂性主要蕴含在解码器中而非潜空间嵌入中。因此,若忽略非线性解码器的影响而直接遍历潜空间,可能导致生成的反事实轨迹不自然。本文提出一种通过解码器与待分析分类器回拉黎曼度量的反事实解释方法。该度量编码了数据复杂几何结构及学习表示的信息,使我们能够获得具有高保真度的鲁棒反事实轨迹,这一点在我们对现实世界表格数据集的实验中得到了验证。