In the field of unsupervised feature selection, sparse principal component analysis (SPCA) methods have attracted more and more attention recently. Compared to spectral-based methods, SPCA methods don't rely on the construction of a similarity matrix and show better feature selection ability on real-world data. The original SPCA formulates a nonconvex optimization problem. Existing convex SPCA methods reformulate SPCA as a convex model by regarding the reconstruction matrix as an optimization variable. However, they are lack of constraints equivalent to the orthogonality restriction in SPCA, leading to larger solution space. In this paper, it's proved that the optimal solution to a convex SPCA model falls onto the Positive Semidefinite (PSD) cone. A standard convex SPCA-based model with PSD constraint for unsupervised feature selection is proposed. Further, a two-step fast optimization algorithm via PSD projection is presented to solve the proposed model. Two other existing convex SPCA-based models are also proven to have their solutions optimized on the PSD cone in this paper. Therefore, the PSD versions of these two models are proposed to accelerate their convergence as well. We also provide a regularization parameter setting strategy for our proposed method. Experiments on synthetic and real-world datasets demonstrate the effectiveness and efficiency of the proposed methods.
翻译:在无监督特征选择领域,稀疏主成分分析(SPCA)方法近年来受到越来越多的关注。与基于谱的方法相比,SPCA方法无需构建相似度矩阵,并在真实世界数据上展现出更优的特征选择能力。原始SPCA方法构建了一个非凸优化问题。现有的凸SPCA方法通过将重构矩阵视为优化变量,将SPCA重新表述为凸模型。然而,这些方法缺乏与SPCA中正交性约束等价的限制条件,导致解空间过大。本文证明凸SPCA模型的最优解位于正半定(PSD)锥上,并提出一种带PSD约束的标准凸SPCA模型用于无监督特征选择。进一步地,本文通过PSD投影提出两步快速优化算法以求解该模型。本文还证明另外两种现有凸SPCA模型的解均在PSD锥上达到最优,因此提出这两个模型的PSD版本以加速其收敛。同时,我们为所提方法提供正则化参数设置策略。在合成数据集与真实数据集上的实验验证了所提方法的有效性与高效性。