Bayesian factor analysis is routinely used for dimensionality reduction in modeling of high-dimensional covariance matrices. Factor analytic decompositions express the covariance as a sum of a low rank and diagonal matrix. In practice, Gibbs sampling algorithms are typically used for posterior computation, alternating between updating the latent factors, loadings, and residual variances. In this article, we exploit a blessing of dimensionality to develop a provably accurate pseudo-posterior for the covariance matrix that bypasses the need for Gibbs or other variants of Markov chain Monte Carlo sampling. Our proposed Factor Analysis with BLEssing of dimensionality (FABLE) approach relies on a first-stage singular value decomposition (SVD) to estimate the latent factors, and then defines a jointly conjugate prior for the loadings and residual variances. The accuracy of the resulting pseudo-posterior for the covariance improves with increasing dimensionality. We show that FABLE has excellent performance in high-dimensional covariance matrix estimation, including producing well calibrated credible intervals, both theoretically and through simulation experiments. We also demonstrate the strength of our approach in terms of accurate inference and computational efficiency by applying it to a gene expression data set.
翻译:贝叶斯因子分析常被用于高维协方差矩阵建模中的降维处理。因子分析分解将协方差表示为低秩矩阵与对角矩阵之和。实践中通常采用吉布斯采样算法进行后验计算,交替更新潜在因子、载荷矩阵及残差方差。本文利用维度优势,构建了一种可证明精确的协方差矩阵伪后验分布,从而避免了吉布斯采样或其他马尔可夫链蒙特卡洛采样变体的需求。我们提出的基于维度优势的因子分析方法(FABLE)通过第一阶段奇异值分解(SVD)估计潜在因子,进而为载荷矩阵和残差方差定义联合共轭先验。所得协方差伪后验的精度随维度增加而提升。理论证明与仿真实验表明,FABLE在高维协方差矩阵估计中表现优异,包括能产生校准良好的可信区间。通过将其应用于基因表达数据集,我们进一步验证了该方法在精确推断与计算效率方面的优势。