Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance. However, sPLS extracts the combinations between two data sets with all data samples so that it cannot detect latent subsets of samples. To extend the application of sPLS by identifying a specific subset of samples and remove outliers, we propose an $\ell_\infty/\ell_0$-norm constrained weighted sparse PLS ($\ell_\infty/\ell_0$-wsPLS) method for joint sample and feature selection, where the $\ell_\infty/\ell_0$-norm constrains are used to select a subset of samples. We prove that the $\ell_\infty/\ell_0$-norm constrains have the Kurdyka-\L{ojasiewicz}~property so that a globally convergent algorithm is developed to solve it. Moreover, multi-view data with a same set of samples can be available in various real problems. To this end, we extend the $\ell_\infty/\ell_0$-wsPLS model and propose two multi-view wsPLS models for multi-view data fusion. We develop an efficient iterative algorithm for each multi-view wsPLS model and show its convergence property. As well as numerical and biomedical data experiments demonstrate the efficiency of the proposed methods.
翻译:稀疏偏最小二乘法是一种常用的数据融合降维技术,通过寻找具有最大方差且包含少量变量的线性组合来投影两个视角的数据样本。然而,传统sPLS方法利用所有数据样本提取两组数据间的组合关系,因此无法检测潜在的样本子集。为扩展sPLS的应用范围,通过识别特定样本子集并剔除异常值,我们提出一种基于$\ell_\infty/\ell_0$-范数约束的加权稀疏偏最小二乘方法,用于联合样本与特征选择,其中$\ell_\infty/\ell_0$-范数约束用于选择样本子集。我们证明$\ell_\infty/\ell_0$-范数约束具有Kurdyka-Łojasiewicz性质,因此可开发全局收敛算法进行求解。此外,实际应用中常遇到具有相同样本集的多视角数据。为此,我们扩展了$\ell_\infty/\ell_0$-wsPLS模型,提出两种用于多视角数据融合的多视角wsPLS模型。针对每种多视角wsPLS模型,我们开发了高效迭代算法并证明其收敛性。数值实验与生物医学数据实验均验证了所提方法的有效性。