In Machine Learning, the $\mathsf{SHAP}$-score is a version of the Shapley value that is used to explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is an intractable problem, we prove a strong positive result stating that the $\mathsf{SHAP}$-score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). We also establish the computational limits of the SHAP-score by observing that computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider. It also implies that computing $\mathsf{SHAP}$-scores is intractable as well over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing $\mathsf{SHAP}$-scores over such class. In contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists for the computation of $\mathsf{SHAP}$-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula $\varphi$ and features $x,y$, whether the $\mathsf{SHAP}$-score of $x$ in $\varphi$ is smaller than the $\mathsf{SHAP}$-score of $y$ in $\varphi$.
翻译:在机器学习中,$\mathsf{SHAP}$-分数是Shapley值的一个变体,用于通过为每个特征分配分数来解释学习模型在特定实体上的预测结果。尽管通常计算Shapley值是一个棘手问题,但我们证明了一个强正向结果:在确定性和可分解的布尔电路上,$\mathsf{SHAP}$-分数可以在多项式时间内计算。这类电路在知识编译领域被研究,并概括了广泛的布尔电路和二元决策图类别,包括二元决策树和有序二元决策图(OBDDs)。我们还通过观察发现,在布尔模型类上计算$\mathsf{SHAP}$-分数在多项式时间内总是与该类的模型计数问题一样困难,从而确立了$\mathsf{SHAP}$-分数的计算限制。这意味着确定性和可分解性是我们考虑的电路的关键属性。这也意味着在DNF(析取范式)命题公式类上计算$\mathsf{SHAP}$-分数同样是棘手的。基于这一负面结果,我们探讨了在这类模型上计算$\mathsf{SHAP}$-分数的完全多项式随机近似方案(FPRAS)的存在性。与DNF公式的模型计数问题(该问题允许FPRAS)形成对比,我们证明对于$\mathsf{SHAP}$-分数的计算不存在这样的FPRAS。令人惊讶的是,这一负面结果甚至对于单调DNF公式类也成立。这些技术可以进一步扩展,以证明另一个强负面结果:在广泛接受的复杂性假设下,不存在多项式时间算法来检查,给定一个单调DNF公式$\varphi$和特征$x,y$,$\varphi$中$x$的$\mathsf{SHAP}$-分数是否小于$\varphi$中$y$的$\mathsf{SHAP}$-分数。