Selecting the most relevant or informative features is a key issue in actual machine learning problems. Since an exhaustive search is not feasible even for a moderate number of features, an intelligent search strategy must be employed for finding an optimal subset, which implies considering how features interact with each other in promoting class separability. Balancing feature subset size and classification accuracy constitutes a multi-objective optimization challenge. Here we propose MOELIGA, a multi-objective genetic algorithm incorporating an evolutionary local improvement strategy that evolves subordinate populations to refine feature subsets. MOELIGA employs a crowding-based fitness sharing mechanism and a sigmoid transformation to enhance diversity and guide compactness, alongside a geometry-based objective promoting classifier independence. Experimental evaluation on 14 diverse datasets demonstrates MOELIGA's ability to identify smaller feature subsets with superior or comparable classification performance relative to 11 state-of-the-art methods. These findings suggest MOELIGA effectively addresses the accuracy-dimensionality trade-off, offering a robust and adaptable approach for multi-objective feature selection in complex, high-dimensional scenarios.
翻译:在实际机器学习问题中,选择最相关或最具信息量的特征是一项关键任务。由于即使对中等数量的特征进行穷举搜索也不可行,因此必须采用智能搜索策略来寻找最优子集,这需要考虑特征在促进类可分性时的相互作用。平衡特征子集大小与分类精度构成了多目标优化挑战。本文提出MOELIGA,一种结合进化局部改进策略的多目标遗传算法,该算法演化子种群以优化特征子集。MOELIGA采用基于拥挤度的适应度共享机制和Sigmoid变换来增强多样性并引导紧凑性,同时结合基于几何的目标函数以促进分类器独立性。在14个不同数据集上的实验评估表明,相较于11种最先进方法,MOELIGA能够识别出更小且具有更优或相当分类性能的特征子集。这些发现表明,MOELIGA有效解决了精度-维度权衡问题,为复杂高维场景下的多目标特征选择提供了一种稳健且适应性强的方案。