Multi-Objective Genetic Algorithm for Multi-View Feature Selection

Multi-view datasets offer diverse forms of data that can enhance prediction models by providing complementary information. However, the use of multi-view data leads to an increase in high-dimensional data, which poses significant challenges for the prediction models that can lead to poor generalization. Therefore, relevant feature selection from multi-view datasets is important as it not only addresses the poor generalization but also enhances the interpretability of the models. Despite the success of traditional feature selection methods, they have limitations in leveraging intrinsic information across modalities, lacking generalizability, and being tailored to specific classification tasks. We propose a novel genetic algorithm strategy to overcome these limitations of traditional feature selection methods for multi-view data. Our proposed approach, called the multi-view multi-objective feature selection genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of features within a view and between views under a unified framework. The MMFS-GA framework demonstrates superior performance and interpretability for feature selection on multi-view datasets in both binary and multiclass classification tasks. The results of our evaluations on three benchmark datasets, including synthetic and real data, show improvement over the best baseline methods. This work provides a promising solution for multi-view feature selection and opens up new possibilities for further research in multi-view datasets.

翻译：多视图数据集提供多种形式的数据，可通过提供互补信息增强预测模型。然而，多视图数据的使用导致高维数据增加，给预测模型带来显著挑战，可能导致泛化能力下降。因此，从多视图数据集中选择相关特征至关重要，这不仅解决了泛化能力不足的问题，还增强了模型的可解释性。尽管传统特征选择方法取得了成功，但在利用跨模态内在信息、缺乏泛化能力以及仅限于特定分类任务方面存在局限。我们提出了一种新型遗传算法策略，以克服传统特征选择方法在多视图数据中的这些局限。我们提出的方法称为多视图多目标特征选择遗传算法（MMFS-GA），能在统一框架内同时选择视图内部和视图间的最优特征子集。MMFS-GA框架在二分类和多分类任务中均展现出对多视图数据集特征选择的优越性能和可解释性。我们在三个基准数据集（包括合成数据和真实数据）上的评估结果表明，该方法优于最佳基线方法。这项工作为多视图特征选择提供了一种有前景的解决方案，并为多视图数据集的进一步研究开辟了新可能。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日