Unifying Formal Explanations: A Complexity-Theoretic Perspective

Previous work has explored the computational complexity of deriving two fundamental types of explanations for ML model predictions: (1) *sufficient reasons*, which are subsets of input features that, when fixed, determine a prediction, and (2) *contrastive reasons*, which are subsets of input features that, when modified, alter a prediction. Prior studies have examined these explanations in different contexts, such as non-probabilistic versus probabilistic frameworks and local versus global settings. In this study, we introduce a unified framework for analyzing these explanations, demonstrating that they can all be characterized through the minimization of a unified probabilistic value function. We then prove that the complexity of these computations is influenced by three key properties of the value function: (1) *monotonicity*, (2) *submodularity*, and (3) *supermodularity* - which are three fundamental properties in *combinatorial optimization*. Our findings uncover some counterintuitive results regarding the nature of these properties within the explanation settings examined. For instance, although the *local* value functions do not exhibit monotonicity or submodularity/supermodularity whatsoever, we demonstrate that the *global* value functions do possess these properties. This distinction enables us to prove a series of novel polynomial-time results for computing various explanations with provable guarantees in the global explainability setting, across a range of ML models that span the interpretability spectrum, such as neural networks, decision trees, and tree ensembles. In contrast, we show that even highly simplified versions of these explanations become NP-hard to compute in the corresponding local explainability setting.

翻译：先前的研究探讨了为机器学习模型预测推导两种基本解释类型的计算复杂性：(1) *充分理由*，即当固定时能确定预测的输入特征子集；(2) *对比理由*，即当修改时会改变预测的输入特征子集。已有研究在不同背景下考察了这些解释，例如非概率性与概率性框架、局部性与全局性设置。本研究提出了一个统一的分析框架，证明这些解释均可通过最小化统一的概率值函数来表征。我们进而证明这些计算的复杂性受值函数三个关键性质的影响：(1) *单调性*，(2) *次模性*，以及(3) *超模性*——这些是*组合优化*中的三个基本性质。我们的研究揭示了在考察的解释设置中关于这些性质本质的一些反直觉结果。例如，尽管*局部*值函数完全不具有单调性或次模性/超模性，但我们证明*全局*值函数确实具备这些性质。这一区别使我们能够证明一系列新颖的多项式时间结果，用于在全局可解释性设置中计算各种具有可证明保证的解释，涵盖包括神经网络、决策树和树集成在内的跨越可解释性谱系的多种机器学习模型。相比之下，我们证明即使在高度简化的版本中，这些解释在对应的局部可解释性设置中也变为NP难计算问题。