Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands. Recently, there has been growing attention on feature selection using neural networks. However, existing methods usually suffer from high computational costs when applied to high-dimensional datasets. In this paper, inspired by evolution processes, we propose a novel resource-efficient supervised feature selection method using sparse neural networks, named \enquote{NeuroFS}. By gradually pruning the uninformative features from the input layer of a sparse neural network trained from scratch, NeuroFS derives an informative subset of features efficiently. By performing several experiments on $11$ low and high-dimensional real-world benchmarks of different types, we demonstrate that NeuroFS achieves the highest ranking-based score among the considered state-of-the-art supervised feature selection models. The code is available on GitHub.
翻译:从数据中选择信息性子集的特征选择方法不仅能提升模型的可解释性和性能,还能降低资源需求。近年来,使用神经网络进行特征选择的方法日益受到关注。然而,现有方法在应用于高维数据集时通常面临高计算成本的困扰。本文受进化过程启发,提出了一种新颖的资源高效型监督特征选择方法,利用稀疏神经网络实现,命名为"NeuroFS"。该方法通过从零训练的稀疏神经网络输入层逐步剪枝无信息特征,高效地提取出信息特征子集。通过在11个涵盖不同维度的低维与高维真实世界基准数据集上进行实验,我们证明了NeuroFS在现有最优监督特征选择模型中获得了最高的基于排名的评分。代码已在GitHub上开源。