Unified View Imputation and Feature Selection Learning for Incomplete Multi-view Data

Although multi-view unsupervised feature selection (MUFS) is an effective technology for reducing dimensionality in machine learning, existing methods cannot directly deal with incomplete multi-view data where some samples are missing in certain views. These methods should first apply predetermined values to impute missing data, then perform feature selection on the complete dataset. Separating imputation and feature selection processes fails to capitalize on the potential synergy where local structural information gleaned from feature selection could guide the imputation, thereby improving the feature selection performance in turn. Additionally, previous methods only focus on leveraging samples' local structure information, while ignoring the intrinsic locality of the feature space. To tackle these problems, a novel MUFS method, called UNified view Imputation and Feature selectIon lEaRning (UNIFIER), is proposed. UNIFIER explores the local structure of multi-view data by adaptively learning similarity-induced graphs from both the sample and feature spaces. Then, UNIFIER dynamically recovers the missing views, guided by the sample and feature similarity graphs during the feature selection procedure. Furthermore, the half-quadratic minimization technique is used to automatically weight different instances, alleviating the impact of outliers and unreliable restored data. Comprehensive experimental results demonstrate that UNIFIER outperforms other state-of-the-art methods.

翻译：尽管多视角无监督特征选择（MUFS）是机器学习中一种有效的降维技术，但现有方法无法直接处理某些视角存在样本缺失的不完整多视角数据。这些方法需先使用预设值填补缺失数据，再对完整数据集进行特征选择。将补全与特征选择过程分离，未能充分利用二者间的潜在协同作用——即特征选择获取的局部结构信息可指导补全过程，进而提升特征选择性能。此外，以往方法仅关注样本的局部结构信息，忽略了特征空间的内在局部性。为解决上述问题，本文提出一种名为统一视角补全与特征选择学习（UNIFIER）的新型MUFS方法。UNIFIER通过从样本空间和特征空间自适应学习相似性诱导图，探索多视角数据的局部结构；随后在特征选择过程中，基于样本与特征相似性图动态恢复缺失视角。进一步采用半二次最小化技术自动加权不同样本，减轻异常值与不可靠恢复数据的影响。综合实验结果表明，UNIFIER性能优于其他现有最优方法。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日