Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for variable selection, and (iii) generalization to more than two data views. There is a pressing need for CCA methods that integrate all three aspects to effectively analyze multi-view high-dimensional data. Results: We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in multi-view variable selection.
翻译:动机:生物医学研究日益产生需要整合分析的多视图高维数据集(如多组学数据)。现有的典型相关分析(CCA)和广义CCA方法最多只能同时解决以下三个关键方面中的两个:(i)非线性依赖性,(ii)变量选择的稀疏性,以及(iii)推广到两个以上数据视图。迫切需要能整合所有三个方面的CCA方法来有效分析多视图高维数据。结果:我们提出了三种非线性、稀疏、广义CCA方法——HSIC-SGCCA、SA-KGCCA和TS-KGCCA,用于多视图高维数据中的变量选择。这些方法将现有的SCCA-HSIC、SA-KCCA和TS-KCCA从双视图扩展到多视图设置。虽然SA-KGCCA和TS-KGCCA通过块坐标下降法求解多凸优化问题,但HSIC-SGCCA引入了SCCA-HSIC中先前被忽略的必要单位方差约束,导致形成非凸、非多凸问题。我们通过将块近似线性方法与线性化交替方向乘子法相结合,有效解决了这一挑战。模拟实验和TCGA-BRCA数据分析表明,HSIC-SGCCA在多视图变量选择中优于竞争方法。