An Efficient Shapley Value Computation for the Naive Bayes Classifier

Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no ``analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal, to another frequently used indicator, the Weight of Evidence (WoE) and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity so that it can be used on very large datasets with extremely low computation time.

翻译：变量选择或机器学习模型输入变量的重要性度量已成为大量研究的焦点。仅拥有良好的模型已不足够，还必须解释其决策过程。这解释了当今为何存在如此多的可解释性算法。其中，沙普利值估计算法是基于合作博弈论的可解释性方法。就朴素贝叶斯分类器而言，据我们所知，目前尚无沙普利值的"解析"表达式。本文针对朴素贝叶斯分类器的特例，提出了沙普利值的精确解析表达式。我们通过解析方法将此沙普利方法与另一常用指标——证据权重（WoE）进行对比，并在真实数据集上对(1)证据权重和(2)KernelShap的结果进行实证比较，讨论其相似性与差异性。结果表明，我们为朴素贝叶斯分类器提出的沙普利方法能以较低算法复杂度提供具有信息量的结果，从而能够以极低的计算时间应用于超大规模数据集。

相关内容

朴素贝叶斯分类器

关注 4

在机器学习中，朴素贝叶斯分类器是一系列以假设特征之间强（朴素）独立下运用贝叶斯定理为基础的简单概率分类器。朴素贝叶斯自20世纪50年代已广泛研究。在20世纪60年代初就以另外一个名称引入到文本信息检索界中，并仍然是文本分类的一种热门（基准）方法，文本分类是以词频为特征判断文件所属类别或其他（如垃圾邮件、合法性、体育或政治等等）的问题。通过适当的预处理，它可以与这个领域更先进的方法（包括支持向量机）相竞争。它在自动医疗诊断中也有应用

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日