Provably Explaining Neural Additive Models

Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a cardinally-minimal subset of input features which by itself is provably sufficient to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case exponential number of verification queries in the number of input features, each of which is NP-hard. In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can efficiently generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component. Our algorithm not only makes the task of obtaining cardinally-minimal explanations feasible, but even outperforms existing algorithms designed to find the relaxed variant of subset-minimal explanations - which may be larger and less informative but easier to compute - despite our algorithm solving a much more difficult task. Our experiments demonstrate that, compared to previous algorithms, our approach provides provably smaller explanations than existing works and substantially reduces the computation time. Moreover, we show that our generated provable explanations offer benefits that are unattainable by standard sampling-based techniques typically used to interpret NAMs.

翻译：尽管事后解释神经网络的方法取得了显著进展，但许多方法仍是启发式的，缺乏可证明的保证。获得具有可证明保证的解释的关键方法是识别输入特征的一个基数最小子集，该子集本身可证明足以确定预测结果。然而，对于标准神经网络，这项任务通常在计算上不可行，因为它需要验证查询的数量在最坏情况下随输入特征数量呈指数增长，而每个查询本身是NP难的。在这项工作中，我们证明对于神经加法模型（NAMs）——一种近期提出的、更具可解释性的神经网络家族——我们可以高效地生成具有此类保证的解释。我们提出了一种针对NAMs的新颖模型特定算法，该算法仅需对输入特征数量进行对数级别的验证查询，即可生成可证明的基数最小解释，前提是对每个小型单变量NAM组件应用一个并行预处理步骤，该步骤的运行时间在所需精度上呈对数级别。我们的算法不仅使得获得基数最小解释的任务变得可行，甚至优于现有旨在寻找子集最小解释（一种可能更大、信息量更少但更易计算的松弛变体）的算法——尽管我们的算法解决的是一个困难得多的任务。实验表明，与先前算法相比，我们的方法提供了比现有工作可证明更小的解释，并大幅减少了计算时间。此外，我们证明我们生成的可证明解释提供了标准基于采样的技术（通常用于解释NAMs）无法实现的优势。