The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is carried out using variational inference, which combines efficiency, modularity and approximate posterior density estimation, enabling the joint analysis of large datasets with parameter uncertainty quantification. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with survival and long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package bayesFPCA.
翻译:多元函数曲线的分析在医疗保健、医学、经济学和社会科学等领域具有产生重要科学发现的潜力。然而,现实场景中的纵向数据通常呈现不规则且稀疏的观测特性,这对现有函数型数据方法提出了重大挑战。本文提出了一种用于多元函数主成分分析的贝叶斯分层框架,该框架通过灵活地整合不同个体及相关曲线间的信息,有效适应此类不规则观测场景的复杂性。模型通过共享的函数主成分得分表征共同的潜在动态,从而在曲线间有效借用信息强度,同时规避了估计协方差矩阵这一计算挑战。这些得分还为曲线联合变异的主要模式提供了简洁的表示,构成了可解释的标量摘要,可用于后续分析。估计过程采用变分推断实现,该方法兼具高效性、模块化特性与近似后验密度估计能力,支持对大规模数据集进行参数不确定性量化的联合分析。详尽的模拟实验评估了该方法在稀疏且不规则采样的多元曲线间共享信息的有效性。本方法还被应用于估计SARS-CoV-2感染患者的分子病程轨迹,并表征患者康复结局的异质性;该研究揭示了免疫、炎症和代谢系统间关键的协调动态,这些动态与疾病发作后长达一年的生存状况及长新冠症状相关。本方法已在R软件包bayesFPCA中实现。