It can be challenging to perform an integrative statistical analysis of multi-view high-dimensional data acquired from different experiments on each subject who participated in a joint study. Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between such data sets. In that context, Structured Sparse CCA (ScSCCA) is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities by assuming the corresponding CCA directional vectors to be sparse. Although it is a rapidly growing area of statistical methodology development, there is a need for developing related methodologies in the Bayesian paradigm. In this manuscript, we propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation by encouraging sparsity in two different levels of the modeling framework. Firstly, we utilize a multiplicative Half-Cauchy process prior to encourage sparsity at the level of the latent variable loading matrices. Additionally, we promote further sparsity in the covariance matrix by using graphical horseshoe prior or diagonal structure. We conduct multiple simulations to compare the performance of the proposed method with that of other frequently used CCA procedures, and we apply the developed procedures to analyze multi-omics data arising from a breast cancer study.
翻译:对参与联合研究的每位受试者从不同实验中获取的多视图高维数据进行整合统计分析可能具有挑战性。典型相关分析(CCA)是一种用于识别此类数据集之间关系的统计方法。在此背景下,结构化稀疏CCA(ScSCCA)是一个快速兴起的方法学领域,旨在通过假设相应的CCA方向向量具有稀疏性,对不同数据模态之间的相互关系进行稳健建模。尽管这是一个快速发展的统计方法学领域,但在贝叶斯范式下开发相关方法仍存在需求。本文提出一种新颖的ScSCCA方法,采用贝叶斯无限因子模型,通过在建模框架的两个不同层次引入稀疏性来实现稳健估计。首先,我们利用乘性半柯西过程先验在潜变量载荷矩阵层面促进稀疏性。其次,通过使用图形马蹄先验或对角结构进一步强化协方差矩阵的稀疏性。我们通过多重模拟将所提方法与其它常用CCA程序的性能进行比较,并将所开发程序应用于分析来自乳腺癌研究的多组学数据。