Feature selection in noisy label scenarios remains an understudied topic. We propose a novel genetic algorithm-based approach, the Noise-Aware Multi-Objective Feature Selection Genetic Algorithm (NMFS-GA), for selecting optimal feature subsets in binary classification with noisy labels. NMFS-GA offers a unified framework for selecting feature subsets that are both accurate and interpretable. We evaluate NMFS-GA on synthetic datasets with label noise, a Breast Cancer dataset enriched with noisy features, and a real-world ADNI dataset for dementia conversion prediction. Our results indicate that NMFS-GA can effectively select feature subsets that improve the accuracy and interpretability of binary classifiers in scenarios with noisy labels.
翻译:在含噪声标签场景下的特征选择仍是一个研究不足的课题。我们提出了一种新颖的基于遗传算法的方法——噪声感知多目标特征选择遗传算法(NMFS-GA),用于在含噪声标签的二分类问题中选择最优特征子集。NMFS-GA提供了一个统一的框架,能够选择既准确又具备可解释性的特征子集。我们在带有标签噪声的合成数据集、富含噪声特征的乳腺癌数据集以及用于痴呆症转化预测的真实世界ADNI数据集上对NMFS-GA进行了评估。结果表明,NMFS-GA能够有效选择特征子集,从而在含噪声标签场景下提升二分类器的准确性与可解释性。