Network models are powerful tools for gaining new insights from complex biological data. Most lines of investigation in biology involve comparing datasets in the setting where the same predictors are measured across multiple studies or conditions (multi-study data). Consequently, the development of statistical tools for network modeling of multi-study data is a highly active area of research. Multi-study factor analysis (MSFA) is a method for estimation of latent variables (factors) in multi-study data. In this work, we generalize MSFA by adding the capacity to estimate Gaussian graphical models (GGMs). Our new tool, MSFA-X, is a framework for latent variable-based graphical modeling of shared and study-specific signals in multi-study data. We demonstrate through simulation that MSFA-X can recover shared and study-specific GGMs and outperforms a graphical lasso benchmark. We apply MSFA-X to analyze maternal response to an oral glucose tolerance test in targeted metabolomic profiles from the Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) Study, identifying network-level differences in glucose metabolism between women with and without gestational diabetes mellitus.
翻译:网络模型是从复杂生物数据中获取新见解的强大工具。生物学中的大多数研究都涉及在多个研究或条件下测量相同预测变量的数据集比较(多研究数据)。因此,开发用于多研究数据网络建模的统计工具是一个高度活跃的研究领域。多研究因子分析(MSFA)是一种用于估计多研究数据中潜在变量(因子)的方法。在本工作中,我们通过增加估计高斯图模型(GGM)的能力来推广MSFA。我们的新工具MSFA-X是一个框架,用于对多研究数据中的共享和特定研究信号进行基于潜在变量的图建模。我们通过模拟证明,MSFA-X能够恢复共享和特定研究的GGM,并且性能优于图形套索基准。我们将MSFA-X应用于分析高血糖与不良妊娠结局(HAPO)研究中靶向代谢组学谱的母体对口服葡萄糖耐量试验的反应,识别了患有和未患有妊娠期糖尿病女性在葡萄糖代谢方面的网络水平差异。