Although multi-view unsupervised feature selection (MUFS) is an effective technology for reducing dimensionality in machine learning, existing methods cannot directly deal with incomplete multi-view data where some samples are missing in certain views. These methods should first apply predetermined values to impute missing data, then perform feature selection on the complete dataset. Separating imputation and feature selection processes fails to capitalize on the potential synergy where local structural information gleaned from feature selection could guide the imputation, thereby improving the feature selection performance in turn. Additionally, previous methods only focus on leveraging samples' local structure information, while ignoring the intrinsic locality of the feature space. To tackle these problems, a novel MUFS method, called UNified view Imputation and Feature selectIon lEaRning (UNIFIER), is proposed. UNIFIER explores the local structure of multi-view data by adaptively learning similarity-induced graphs from both the sample and feature spaces. Then, UNIFIER dynamically recovers the missing views, guided by the sample and feature similarity graphs during the feature selection procedure. Furthermore, the half-quadratic minimization technique is used to automatically weight different instances, alleviating the impact of outliers and unreliable restored data. Comprehensive experimental results demonstrate that UNIFIER outperforms other state-of-the-art methods.
翻译:尽管多视角无监督特征选择(MUFS)是机器学习中一种有效的降维技术,但现有方法无法直接处理某些视角存在样本缺失的不完整多视角数据。这些方法需先使用预设值填补缺失数据,再对完整数据集进行特征选择。将补全与特征选择过程分离,未能充分利用二者间的潜在协同作用——即特征选择获取的局部结构信息可指导补全过程,进而提升特征选择性能。此外,以往方法仅关注样本的局部结构信息,忽略了特征空间的内在局部性。为解决上述问题,本文提出一种名为统一视角补全与特征选择学习(UNIFIER)的新型MUFS方法。UNIFIER通过从样本空间和特征空间自适应学习相似性诱导图,探索多视角数据的局部结构;随后在特征选择过程中,基于样本与特征相似性图动态恢复缺失视角。进一步采用半二次最小化技术自动加权不同样本,减轻异常值与不可靠恢复数据的影响。综合实验结果表明,UNIFIER性能优于其他现有最优方法。