Feature selection is a combinatorial optimization problem that is NP-hard. Conventional approaches often employ heuristic or greedy strategies, which are prone to premature convergence and may fail to capture subtle yet informative features. This limitation becomes especially critical in high-dimensional datasets, where complex and interdependent feature relationships prevail. We introduce the HeFS (Helper-Enhanced Feature Selection) framework to refine feature subsets produced by existing algorithms. HeFS systematically searches the residual feature space to identify a Helper Set - features that complement the original subset and improve classification performance. The approach employs a biased initialization scheme and a ratio-guided mutation mechanism within a genetic algorithm, coupled with Pareto-based multi-objective optimization to jointly maximize predictive accuracy and feature complementarity. Experiments on 18 benchmark datasets demonstrate that HeFS consistently identifies overlooked yet informative features and achieves superior performance over state-of-the-art methods, including in challenging domains such as gastric cancer classification, drug toxicity prediction, and computer science applications. The code and datasets are available at https://healthinformaticslab.org/supp/.
翻译:特征选择是一个NP难的组合优化问题。传统方法通常采用启发式或贪心策略,这些策略容易过早收敛,且可能无法捕获细微但信息丰富的特征。在高维数据集中,复杂且相互依赖的特征关系普遍存在,这一局限性变得尤为关键。我们提出了HeFS(辅助增强特征选择)框架,用于优化现有算法生成的特征子集。HeFS系统性地搜索剩余特征空间,以识别一个辅助集——这些特征能够补充原始子集并提升分类性能。该方法在遗传算法中采用了偏置初始化方案和比率引导的变异机制,并结合基于帕累托的多目标优化,以共同最大化预测准确性和特征互补性。在18个基准数据集上的实验表明,HeFS能够持续识别被忽视但信息丰富的特征,并在包括胃癌分类、药物毒性预测和计算机科学应用等挑战性领域中,取得了优于现有最先进方法的性能。代码和数据集可在 https://healthinformaticslab.org/supp/ 获取。