HeFS：基于帕累托优化遗传搜索的辅助增强特征选择 (HeFS: Helper-Enhanced Feature Selection via Pareto-Optimized Genetic Search)

Feature selection is a combinatorial optimization problem that is NP-hard. Conventional approaches often employ heuristic or greedy strategies, which are prone to premature convergence and may fail to capture subtle yet informative features. This limitation becomes especially critical in high-dimensional datasets, where complex and interdependent feature relationships prevail. We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms. HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance. The approach employs a biased initialization scheme and a ratio-guided mutation mechanism within a genetic algorithm, coupled with Pareto-based multi-objective optimization to jointly maximize predictive accuracy and feature complementarity. Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies overlooked yet informative features and achieves superior performance over state-of-the-art methods, including in challenging domains such as gastric cancer classification, drug toxicity prediction, and computer science applications. The code and datasets are available at https://healthinformaticslab.org/supp/.

翻译：特征选择是一个NP难的组合优化问题。传统方法通常采用启发式或贪心策略，这些策略容易过早收敛，且可能无法捕获细微但信息丰富的特征。在高维数据集中，复杂且相互依赖的特征关系普遍存在，这一局限性变得尤为关键。我们提出了HeFS（辅助增强特征选择）框架，用于优化现有算法生成的特征子集。HeFS系统性地搜索剩余特征空间，以识别一个辅助集——这些特征能够补充原始子集并提升分类性能。该方法在遗传算法中采用了偏置初始化方案和比率引导的变异机制，并结合基于帕累托的多目标优化，以共同最大化预测准确性和特征互补性。在18个基准数据集上的实验表明，HeFS能够持续识别被忽视但信息丰富的特征，并在包括胃癌分类、药物毒性预测和计算机科学应用等挑战性领域中，取得了优于现有最先进方法的性能。代码和数据集可在 https://healthinformaticslab.org/supp/ 获取。

相关内容

特征选择

关注 5939

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日