LLpowershap: Logistic Loss-based Automated Shapley Values Feature Selection Method

Shapley values have been used extensively in machine learning, not only to explain black box machine learning models, but among other tasks, also to conduct model debugging, sensitivity and fairness analyses and to select important features for robust modelling and for further follow-up analyses. Shapley values satisfy certain axioms that promote fairness in distributing contributions of features toward prediction or reducing error, after accounting for non-linear relationships and interactions when complex machine learning models are employed. Recently, a number of feature selection methods utilising Shapley values have been introduced. Here, we present a novel feature selection method, LLpowershap, which makes use of loss-based Shapley values to identify informative features with minimal noise among the selected sets of features. Our simulation results show that LLpowershap not only identifies higher number of informative features but outputs fewer noise features compared to other state-of-the-art feature selection methods. Benchmarking results on four real-world datasets demonstrate higher or at par predictive performance of LLpowershap compared to other Shapley based wrapper methods, or filter methods.

翻译：Shapley值已广泛应用于机器学习领域，不仅用于解释黑箱机器学习模型，还承担着模型调试、敏感性与公平性分析，以及为稳健建模和后续分析选择重要特征等任务。Shapley值满足特定公理，在考虑复杂机器学习模型中的非线性关系与交互作用后，能够促进特征对预测或误差减少贡献分配的公平性。近年来，涌现出多种利用Shapley值的特征选择方法。本文提出一种新颖的特征选择方法LLpowershap，该方法利用基于损失的Shapley值，从选定特征集中以最小噪声识别信息性特征。仿真结果表明，与其他最先进的特征选择方法相比，LLpowershap不仅能识别更多信息性特征，且输出的噪声特征更少。在四个真实世界数据集上的基准测试结果显示，相较于其他基于Shapley的包装器方法或过滤方法，LLpowershap具有更高或相当的预测性能。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日