This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The methodology employs the Jeffries-Matusita (JM) distance in conjunction with t-distributed Stochastic Neighbor Embedding (t-SNE) to generate a low-dimensional space reflecting how effectively each feature can differentiate between each pair of classes. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. Experimental results on public data sets demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches. Moreover, the proposed algorithm maintained the accuracy achieved when utilizing all features, while using only $7\%$ to $30\%$ of the features. Consequently, this resulted in a reduction of the time needed for classifications, from $15\%$ to $70\%$.
翻译:本文提出了一种新颖的基于图的过滤式自动特征选择方法(简称GB-AFS),用于多类分类任务。该方法通过保持各类别间互补判别能力,确定维持预测性能所需的最小特征组合,且无需预设特征数量等用户定义参数。技术方案采用杰弗里斯-马图西塔(JM)距离结合t分布随机邻域嵌入(t-SNE)生成低维空间,以反映每个特征对不同类别对的区分效能。通过新开发的均值简化轮廓(简称MSS)指数评估特征选择任务的聚类效果,从而选定最小特征数量。在公开数据集上的实验结果表明,所提GB-AFS方法在性能上优于其他过滤式技术与自动特征选择方法。此外,该算法在使用仅7%至30%特征的情况下,保持了采用全部特征时的分类精度,进而将分类所需时间缩减了15%至70%。