The increasing availability of high-dimensional, longitudinal measures of genetic expression can facilitate analysis of the biological mechanisms of disease and prediction of future trajectories, as required for precision medicine. Biological knowledge suggests that it may be best to describe complex diseases at the level of underlying pathways, which may interact with one another. We propose a Bayesian approach that allows for characterising such correlation among different pathways through Dependent Gaussian Processes (DGP) and mapping the observed high-dimensional gene expression trajectories into unobserved low-dimensional pathway expression trajectories via Bayesian Sparse Factor Analysis. Compared to previous approaches that model each pathway expression trajectory independently, our model demonstrates better performance in recovering the shape of pathway expression trajectories, revealing the relationships between genes and pathways, and predicting gene expressions (closer point estimates and narrower predictive intervals), as demonstrated in the simulation study and real data analysis. To fit the model, we propose a Monte Carlo Expectation Maximization (MCEM) scheme that can be implemented conveniently by combining a standard Markov Chain Monte Carlo sampler and an R package GPFDA (Konzen and others, 2021), which returns the maximum likelihood estimates of DGP parameters. The modular structure of MCEM makes it generalizable to other complex models involving the DGP model component. An R package has been developed that implements the proposed approach.
翻译:随着基因表达高维纵向测量数据的日益丰富,这有助于分析疾病的生物学机制并预测未来轨迹——这正是精准医学所需。生物学知识表明,复杂疾病最好在潜在通路层面进行描述,而这些通路可能相互影响。我们提出一种贝叶斯方法,通过依赖高斯过程(DGP)表征不同通路间的相关性,并借助贝叶斯稀疏因子分析将观测到的高维基因表达轨迹映射到未观测的低维通路表达轨迹。与先前独立建模每条通路表达轨迹的方法相比,我们的模型在恢复通路表达轨迹形态、揭示基因与通路关系以及预测基因表达(更接近的点估计和更窄的预测区间)方面展现出更优性能——这一点在模拟研究与真实数据分析中均得到验证。为拟合该模型,我们提出蒙特卡洛期望最大化(MCEM)方案,该方案可通过结合标准马尔可夫链蒙特卡洛采样器与R包GPFDA(Konzen等人,2021)便捷实现,后者可返回DGP参数的最大似然估计。MCEM的模块化结构使其可推广至其他包含DGP模型组件的复杂模型。我们已开发实现该方法的R包。