Interpretable feature subset selection: A Shapley value based approach

from arxiv, A shorter version of this work appeared in a special session titled Explainable AI at IEEE BigData'20 conference. More experiments and a new notion of interpretable FSS introduced in this version. Earlier plots for sample bias robustness are corrected and updated

For feature selection and related problems, we introduce the notion of classification game, a cooperative game, with features as players and hinge loss based characteristic function and relate a feature's contribution to Shapley value based error apportioning (SVEA) of total training error. Our major contribution is ($\star$) to show that for any dataset the threshold 0 on SVEA value identifies feature subset whose joint interactions for label prediction is significant or those features that span a subspace where the data is predominantly lying. In addition, our scheme ($\star$) identifies the features on which Bayes classifier doesn't depend but any surrogate loss function based finite sample classifier does; this contributes to the excess $0$-$1$ risk of such a classifier, ($\star$) estimates unknown true hinge risk of a feature, and ($\star$) relate the stability property of an allocation and negative valued SVEA by designing the analogue of core of classification game. Due to Shapley value's computationally expensive nature, we build on a known Monte Carlo based approximation algorithm that computes characteristic function (Linear Programs) only when needed. We address the potential sample bias problem in feature selection by providing interval estimates for SVEA values obtained from multiple sub-samples. We illustrate all the above aspects on various synthetic and real datasets and show that our scheme achieves better results than existing recursive feature elimination technique and ReliefF in most cases. Our theoretically grounded classification game in terms of well defined characteristic function offers interpretability (which we formalize in terms of final task) and explainability of our framework, including identification of important features.

翻译：对于特性选择及相关问题,我们引入了分类游戏的概念,即合作游戏,其特点作为玩家,并紧扣基于损失的特性功能,并将某个特性与基于沙皮利值的值差差(SVEA)的总培训错误分配(SVEA)相关。我们的主要贡献是(美元),以显示在SVEA值上的任何数据设定阈值 0 阈值时,发现其标签预测联合互动意义重大或数据主要位于某个子空间的特征。此外,我们的方案($star$)基于Bayes分类器不依赖的特征,而基于任何基于有限样本分类师的代理损失函数;它有助于使基于沙皮值的值差差差差差差差差(SVEA)对总培训差差差差差差差(SVEA)的风险。我们的主要贡献是(Star$$$) 来显示一个配置值的稳定性和负值SVEA值的特性,我们只能用一个已知的近似值的近似值的近似值。我们用Sliearral 定义了我们现有的精确值的精确度定义的精确度和精确度数据选择中,我们所有可能的定序中,我们现有的定值的精确值的精确值。我们用Straalalalalalalalalalal ——我们用所有的精度 ——我们从各种的精确的精度展示的精确度 ——我们用在各种的精确度选择的精确度的精确度上,我们用在SVI 显示的精确度选择中,我们所有的精确度选择的精确度。