Multi-view datasets offer diverse forms of data that can enhance prediction models by providing complementary information. However, the use of multi-view data leads to an increase in high-dimensional data, which poses significant challenges for the prediction models that can lead to poor generalization. Therefore, relevant feature selection from multi-view datasets is important as it not only addresses the poor generalization but also enhances the interpretability of the models. Despite the success of traditional feature selection methods, they have limitations in leveraging intrinsic information across modalities, lacking generalizability, and being tailored to specific classification tasks. We propose a novel genetic algorithm strategy to overcome these limitations of traditional feature selection methods for multi-view data. Our proposed approach, called the multi-view multi-objective feature selection genetic algorithm (MMFS-GA), simultaneously selects the optimal subset of features within a view and between views under a unified framework. The MMFS-GA framework demonstrates superior performance and interpretability for feature selection on multi-view datasets in both binary and multiclass classification tasks. The results of our evaluations on three benchmark datasets, including synthetic and real data, show improvement over the best baseline methods. This work provides a promising solution for multi-view feature selection and opens up new possibilities for further research in multi-view datasets.
翻译:多视图数据集提供多种形式的数据,可通过提供互补信息增强预测模型。然而,多视图数据的使用导致高维数据增加,给预测模型带来显著挑战,可能导致泛化能力下降。因此,从多视图数据集中选择相关特征至关重要,这不仅解决了泛化能力不足的问题,还增强了模型的可解释性。尽管传统特征选择方法取得了成功,但在利用跨模态内在信息、缺乏泛化能力以及仅限于特定分类任务方面存在局限。我们提出了一种新型遗传算法策略,以克服传统特征选择方法在多视图数据中的这些局限。我们提出的方法称为多视图多目标特征选择遗传算法(MMFS-GA),能在统一框架内同时选择视图内部和视图间的最优特征子集。MMFS-GA框架在二分类和多分类任务中均展现出对多视图数据集特征选择的优越性能和可解释性。我们在三个基准数据集(包括合成数据和真实数据)上的评估结果表明,该方法优于最佳基线方法。这项工作为多视图特征选择提供了一种有前景的解决方案,并为多视图数据集的进一步研究开辟了新可能。