This article focuses on covariance estimation for multi-view data. Popular approaches rely on factor-analytic decompositions that have shared and view-specific latent factors. Posterior computation is conducted via expensive and brittle Markov chain Monte Carlo (MCMC) sampling or variational approximations that underestimate uncertainty and lack theoretical guarantees. Our proposed methodology employs spectral decompositions to estimate and align latent factors that are active in at least one view. Conditionally on these factors, we choose jointly conjugate prior distributions for factor loadings and residual variances. The resulting posterior is a simple product of normal-inverse gamma distributions for each variable, bypassing MCMC and facilitating posterior computation. We prove favorable increasing-dimension asymptotic properties, including posterior contraction and central limit theorems for point estimators. We show excellent performance in simulations, including accurate uncertainty quantification, and apply the methodology to integrate four high-dimensional views from a multi-omics dataset of cancer cell samples.
翻译:本文关注多视角数据的协方差估计问题。主流方法依赖因子分析分解,该分解包含共享因子和视角特有因子。后验计算需借助昂贵且脆弱的马尔可夫链蒙特卡洛(MCMC)采样,或采用低估不确定性且缺乏理论保证的变分近似方法。我们提出的方法利用谱分解来估计并对齐至少在一个视角中活跃的潜在因子。在这些因子的条件下,我们为因子载荷和残差方差选择联合共轭先验分布。所得后验分布是每个变量对应的正态-逆伽马分布的简单乘积形式,从而绕过了MCMC并简化了后验计算。我们证明了有利的递增维数渐近性质,包括后验收缩性和点估计的中央极限定理。模拟实验展示了包括准确不确定性量化在内的优异表现,并将该方法应用于整合癌症细胞样本多组学数据集的四个高维视角。