Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no ``analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal, to another frequently used indicator, the Weight of Evidence (WoE) and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity so that it can be used on very large datasets with extremely low computation time.
翻译:变量选择或机器学习模型输入变量的重要性度量已成为大量研究的焦点。仅拥有良好的模型已不足够,还必须解释其决策过程。这解释了当今为何存在如此多的可解释性算法。其中,沙普利值估计算法是基于合作博弈论的可解释性方法。就朴素贝叶斯分类器而言,据我们所知,目前尚无沙普利值的"解析"表达式。本文针对朴素贝叶斯分类器的特例,提出了沙普利值的精确解析表达式。我们通过解析方法将此沙普利方法与另一常用指标——证据权重(WoE)进行对比,并在真实数据集上对(1)证据权重和(2)KernelShap的结果进行实证比较,讨论其相似性与差异性。结果表明,我们为朴素贝叶斯分类器提出的沙普利方法能以较低算法复杂度提供具有信息量的结果,从而能够以极低的计算时间应用于超大规模数据集。