Large-scale longitudinal molecular profiling is now firmly established in biomedical research, prompted by the need to uncover coordinated biomarker trajectories reflecting the dynamics of underlying biological mechanisms and characterise patient heterogeneity in disease progression. While a range of statistical tools exist for either longitudinal modelling or high-dimensional analysis, there is no unified framework tailored to address these questions jointly. Motivated by a longitudinal COVID-19 study conducted in Cambridge hospitals, we propose a Bayesian functional factor model to address this gap. The framework combines latent factor modelling with functional principal component analysis to represent shared temporal programmes across subsets of variables while capturing individual variation through low-dimensional functional scores. We specify sparsity-inducing priors that yield interpretable factor structure and allow the effective number of factors to be inferred via overspecification. An annealed variational algorithm ensures efficient joint posterior inference at scale. The approach achieves accurate recovery of temporal structure in simulations with up to 20 000 variables. Application to the COVID-19 data reveals clinically meaningful heterogeneity in recovery dynamics through interpretable subject-level scores capturing coordinated inflammatory and immune-response pathway activity. The methodology is implemented in the R package bayesSYNC.
翻译:大规模纵向分子谱分析现已广泛应用于生物医学研究,其动力源于揭示反映潜在生物机制动态变化的协同生物标志物轨迹,以及刻画疾病进展中的患者异质性。尽管存在多种适用于纵向建模或高维分析的统计工具,但目前尚无统一框架可联合解决上述问题。受剑桥医院一项纵向COVID-19研究启发,我们提出了一种贝叶斯函数型因子模型以填补这一空白。该框架融合了潜在因子模型与函数型主成分分析,可表示变量子集的共享时间程序,同时通过低维函数型得分捕捉个体变异。我们采用稀疏先验以生成可解释的因子结构,并允许通过过指定化推断有效因子数量。一种退火变分算法可确保在大规模数据中实现高效的联合后验推断。在包含多达20,000个变量的模拟中,该方法能准确恢复时间结构。应用于COVID-19数据时,通过可解释的受试者水平得分揭示出恢复动态中具有临床意义的异质性,这些得分可捕捉到协调的炎症与免疫应答通路活性。该方法已在R包bayesSYNC中实现。