Sequential Attention for Feature Selection

Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.

翻译：特征选择问题旨在为机器学习模型挑选一个特征子集，在预算约束下最大化模型质量。对于神经网络而言，现有方法（包括基于ℓ₁正则化、注意力机制及其他技术的方法）通常仅通过单轮评估选择全部特征子集，忽略了特征选择过程中的残差值（即某个特征在其他特征已被选定的条件下的边际贡献）。我们提出一种名为“顺序注意力”（Sequential Attention）的特征选择算法，该算法在神经网络上取得了当前最优的实证表现。该算法基于高效的单遍贪婪前向选择实现，并利用每一步的注意力权重作为特征重要性的代理指标。我们针对线性回归场景给出了该算法的理论分析，证明其在该场景下的适配版本等价于经典的正交匹配追踪（OMP）算法，从而继承了该算法的所有可证明保证。理论分析与实证研究为注意力机制的有效性及其与过参数化的关联提供了新的解释，这些发现可能具有独立的研究价值。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日