Sparse functional data arise when measurements are observed infrequently and at irregular time points for each subject, often in the presence of measurement error. These characteristics introduce additional challenges for functional principal component analysis. In this paper, we propose a new approach for extracting functional principal components from such data by combining basis expansion with maximum likelihood estimation. Orthogonality of the estimated eigenfunctions is preserved throughout the optimization using modified Gram-Schmidt orthonormalization. An information criterion is proposed to select both the optimal number of basis functions and the rank of the covariance structure. Principal component scores are subsequently estimated via conditional expectation, enabling accurate reconstruction of the underlying functional trajectories across the full domain despite sparse observations. Simulation studies demonstrate the effectiveness of the proposed method and show that it performs favorably compared with existing approaches. Its practical utility is illustrated through applications to CD4 cell count data from the Multicenter AIDS Cohort Study and somatic cell count data from Irish research dairy cattle. Supplementary materials, including technical details, additional simulation results, and the R package mGSFPCA, are available online.
翻译:稀疏函数数据指每个个体测量值观测频率低且时间点不规则,常伴随测量误差。此类特性为功能性主成分分析带来额外挑战。本文提出一种结合基展开与极大似然估计的新方法,用于从这类数据中提取功能性主成分。通过修正的Gram-Schmidt正交化方法,在优化过程中保持估计特征函数的正交性。提出信息准则以同时选择最优基函数数量与协方差结构秩。随后通过条件期望估计主成分得分,使得即使在稀疏观测条件下也能在全域精确重建潜在函数轨迹。模拟研究证明了所提方法的有效性,且其性能优于现有方法。通过多中心艾滋病队列研究的CD4细胞计数数据与爱尔兰研究用乳牛体细胞计数数据的应用实例,展示了方法的实用价值。补充材料(包括技术细节、额外模拟结果及R包mGSFPCA)可在线获取。