Trajectory data, including time series and longitudinal measurements, are increasingly common in health-related domains such as biomedical research and epidemiology. Real-world trajectory data frequently exhibit heterogeneity across subjects such as patients, sites, and subpopulations, yet many traditional methods are not designed to accommodate such heterogeneity in data analysis. To address this, we propose a unified framework, termed Functional Singular Value Decomposition (FSVD), for statistical learning with heterogeneous trajectories. We establish the theoretical foundations of FSVD and develop a corresponding estimation algorithm that accommodates noisy and irregular observations. We further adapt FSVD to a wide range of trajectory-learning tasks, including dimension reduction, factor modeling, regression, clustering, and data completion, while preserving its ability to account for heterogeneity, leverage inherent smoothness, and handle irregular sampling. Through extensive simulations, we demonstrate that FSVD-based methods consistently outperform existing approaches across these tasks. Finally, we apply FSVD to a COVID-19 case-count dataset and electronic health record datasets, showcasing its effective performance in global and subgroup pattern discovery and factor analysis.
翻译:轨迹数据,包括时间序列与纵向测量数据,在生物医学研究与流行病学等健康相关领域中日益普遍。现实世界的轨迹数据常在不同个体(如患者、研究场所及亚群)间表现出异质性,然而许多传统方法在设计上并未考虑在数据分析中适应此类异质性。为此,我们提出一个统一框架,称为功能奇异值分解,用于异质轨迹的统计学习。我们建立了FSVD的理论基础,并开发了一种能适应含噪及不规则观测的相应估计算法。我们进一步将FSVD适配于广泛的轨迹学习任务,包括降维、因子建模、回归、聚类与数据补全,同时保持其处理异质性、利用内在平滑性及应对不规则采样的能力。通过大量模拟实验,我们证明了基于FSVD的方法在这些任务中持续优于现有方法。最后,我们将FSVD应用于一个COVID-19病例数数据集和多个电子健康记录数据集,展示了其在全局与亚群模式发现及因子分析中的有效性能。