Optimization layers in deep neural networks have enjoyed a growing popularity in structured learning, improving the state of the art on a variety of applications. Yet, these pipelines lack interpretability since they are made of two opaque layers: a highly non-linear prediction model, such as a deep neural network, and an optimization layer, which is typically a complex black-box solver. Our goal is to improve the transparency of such methods by providing counterfactual explanations. We build upon variational autoencoders a principled way of obtaining counterfactuals: working in the latent space leads to a natural notion of plausibility of explanations. We finally introduce a variant of the classic loss for VAE training that improves their performance in our specific structured context. These provide the foundations of CF-OPT, a first-order optimization algorithm that can find counterfactual explanations for a broad class of structured learning architectures. Our numerical results show that both close and plausible explanations can be obtained for problems from the recent literature.
翻译:深度神经网络中的优化层在结构化学习中日益普及,并在多种应用上提升了现有技术水平。然而,这些流程缺乏可解释性,因为它们由两个不透明的层组成:一个高度非线性的预测模型(如深度神经网络)和一个通常是复杂黑盒求解器的优化层。我们的目标是通过提供反事实解释来提高此类方法的透明度。我们基于变分自编码器构建了一种获取反事实解释的原则性方法:在潜在空间中工作可形成解释合理性的自然概念。最后,我们引入了一种经典VAE训练损失的变体,以提升其在特定结构化场景中的性能。这些构成了CF-OPT的基础——一种能够为广泛的结构化学习架构寻找反事实解释的一阶优化算法。数值结果表明,对于近期文献中的问题,我们能够获得既接近又合理的解释。