Factors models are routinely used to analyze high-dimensional data in both single-study and multi-study settings. Bayesian inference for such models relies on Markov Chain Monte Carlo (MCMC) methods which scale poorly as the number of studies, observations, or measured variables increase. To address this issue, we propose variational inference algorithms to approximate the posterior distribution of Bayesian latent factor models using the multiplicative gamma process shrinkage prior. The proposed algorithms provide fast approximate inference at a fraction of the time and memory of MCMC-based implementations while maintaining comparable accuracy in characterizing the data covariance matrix. We conduct extensive simulations to evaluate our proposed algorithms and show their utility in estimating the model for high-dimensional multi-study gene expression data in ovarian cancers. Overall, our proposed approaches enable more efficient and scalable inference for factor models, facilitating their use in high-dimensional settings. An R package VIMSFA implementing our methods is available on GitHub (github.com/blhansen/VI-MSFA).
翻译:因子模型在单研究和多研究环境中被广泛用于分析高维数据。对此类模型进行贝叶斯推断依赖于马尔可夫链蒙特卡洛(MCMC)方法,但随着研究数量、观测值或测量变量的增加,这些方法的可扩展性较差。为解决这一问题,我们提出了变分推断算法,使用乘法伽马过程收缩先验来近似贝叶斯潜因子模型的后验分布。所提算法在表征数据协方差矩阵方面保持了相当的精度,同时仅需MCMC实现所需时间和内存的一小部分,即可提供快速的近似推断。我们进行了大量模拟来评估所提算法,并展示了其在估计卵巢癌高维多研究基因表达数据模型方面的实用性。总体而言,我们提出的方法使因子模型的推断更加高效且可扩展,促进了其在高维环境中的应用。实现我们方法的R包VIMSFA可在GitHub上获取(github.com/blhansen/VI-MSFA)。