Missing data are pervasive in modern functional datasets, where trajectories are often sparsely or irregularly observed. Although Functional Principal Component Analysis (FPCA) is widely used to reconstruct incomplete curves, existing FPCA-based approaches typically employ single imputation, leading to overly optimistic inferences in downstream analyses. To address these challenges, we develop a novel Bayesian multiple imputation framework for functional data (BAMIFun). For single-level functional data, we impose a Bayesian low-rank model that incorporates penalized spline representations to enforce smoothness of eigenfunctions and derive an efficient Gibbs sampler algorithm for posterior computation. In addition, we demonstrate and validate how to properly account for the estimation uncertainties in downstream analysis. Furthermore, we extend the framework to multiway functional data using a low-rank Functional Tensor Singular Value Decomposition (FTSVD) model, enabling Bayesian multiple imputation in settings not supported by existing methods. Simulation studies show that, compared to existing methods, BAMIFun achieves accurate imputation while providing substantially improved coverage and more reliable downstream inference. Case studies using a physical activity dataset and an infant gut microbiome dataset further demonstrate the practical advantages of our proposed methods under severe missingness. Code for our algorithms is available at https://github.com/ZirenJiang/BAMIFun.
翻译:缺失数据在现代函数型数据集中普遍存在,其中轨迹通常呈现稀疏或不规则观测特征。尽管函数型主成分分析(FPCA)被广泛用于重建不完整曲线,但现有基于FPCA的方法通常采用单一插补,导致下游分析产生过度乐观的推断。针对这些问题,我们提出了一种新颖的函数型数据贝叶斯多重插补框架(BAMIFun)。对于单层函数型数据,我们引入贝叶斯低秩模型,通过惩罚样条表示来保证特征函数的平滑性,并推导了高效的吉布斯采样算法用于后验计算。此外,我们论证并验证了如何在下游分析中恰当考虑估计不确定性。进一步地,我们将该框架扩展至多元函数型数据,利用低秩函数型张量奇异值分解(FTSVD)模型,在现有方法无法支持的场景中实现贝叶斯多重插补。模拟研究表明,与现有方法相比,BAMIFun在实现精准插补的同时,显著提升了覆盖率并增强了下游推断的可靠性。基于身体活动数据集和婴儿肠道微生物组数据集的实证研究进一步展示了本方法在严重缺失场景下的实际优势。算法代码见https://github.com/ZirenJiang/BAMIFun。