Sparse functional/longitudinal data have attracted widespread interest due to the prevalence of such data in social and life sciences. A prominent scenario where such data are routinely encountered are accelerated longitudinal studies, where subjects are enrolled in the study at a random time and are only tracked for a short amount of time relative to the domain of interest. The statistical analysis of such functional snippets is challenging since information for the far-off-diagonal regions of the covariance structure is missing. Our main methodological contribution is to address this challenge by bypassing covariance estimation and instead modeling the underlying process as the solution of a data-adaptive stochastic differential equation. Taking advantage of the interface between Gaussian functional data and stochastic differential equations makes it possible to efficiently reconstruct the target process by estimating its dynamic distribution. The proposed approach allows one to consistently recover forward sample paths from functional snippets at the subject level. We establish the existence and uniqueness of the solution to the proposed data-driven stochastic differential equation and derive rates of convergence for the corresponding estimators. The finite-sample performance is demonstrated with simulation studies and functional snippets arising from a growth study and spinal bone mineral density data.
翻译:稀疏功能/纵向数据因在社会与生命科学中的普遍存在而受到广泛关注。加速纵向研究是此类数据的典型应用场景,受试者在随机时间点被纳入研究后仅追踪观测较短时间内(相较于感兴趣的定义域而言)。这类函数片段的统计分析极具挑战性,因其协方差结构的远对角区域信息完全缺失。本文的主要方法论贡献在于通过绕过协方差估计,转而将潜在过程建模为数据自适应随机微分方程的解来解决这一难题。利用高斯功能数据与随机微分方程之间的关联特性,可通过估计动态分布高效重建目标过程。所提方法能够在个体层面从函数片段一致地重建正向样本轨迹。我们证明了所提数据驱动随机微分方程解的存在唯一性,并推导了对应估计量的收敛速率。通过仿真实验及生长研究中的函数片段与脊柱骨密度数据验证了有限样本性能。