Factors models are routinely used to analyze high-dimensional data in both single-study and multi-study settings. Bayesian inference for such models relies on Markov Chain Monte Carlo (MCMC) methods which scale poorly as the number of studies, observations, or measured variables increase. To address this issue, we propose variational inference algorithms to approximate the posterior distribution of Bayesian latent factor models using the multiplicative gamma process shrinkage prior. The proposed algorithms provide fast approximate inference at a fraction of the time and memory of MCMC-based implementations while maintaining comparable accuracy in characterizing the data covariance matrix. We conduct extensive simulations to evaluate our proposed algorithms and show their utility in estimating the model for high-dimensional multi-study gene expression data in ovarian cancers. Overall, our proposed approaches enable more efficient and scalable inference for factor models, facilitating their use in high-dimensional settings.
翻译:因子模型常用于分析单研究与多研究场景中的高维数据。该类模型的贝叶斯推断依赖于马尔可夫链蒙特卡洛方法,而随着研究数量、观测次数或测量变量维度的增加,其计算效率显著下降。为解决此问题,我们提出基于乘性伽马过程收缩先验的变分推断算法,以近似贝叶斯潜变量因子模型的后验分布。所提算法在极少于MCMC实现所需的时间与内存消耗下提供快速近似推断,同时保持对数据协方差矩阵表征的准确性。通过大规模模拟实验验证算法性能,并展示其在卵巢癌高维多研究基因表达数据模型估计中的实用性。总体而言,本方法实现了因子模型更高效、可扩展的推断,促进其在高维场景中的应用。