Dynamic prediction, which typically refers to the prediction of future outcomes using historical records, is often of interest in biomedical research. For datasets with large sample sizes, high measurement density, and complex correlation structures, traditional methods are often infeasible because of the computational burden associated with both data scale and model complexity. Moreover, many models do not directly facilitate out-of-sample predictions for generalized outcomes. To address these issues, we develop a novel approach for dynamic predictions based on a recently developed method estimating complex patterns of variation for exponential family data: fast Generalized Functional Principal Components Analysis (fGFPCA). Our method is able to handle large-scale, high-density repeated measures much more efficiently with its implementation feasible even on personal computational resources (e.g., a standard desktop or laptop computer). The proposed method makes highly flexible and accurate predictions of future trajectories for data that exhibit high degrees of nonlinearity, and allows for out-of-sample predictions to be obtained without reestimating any parameters. A simulation study is designed and implemented to illustrate the advantages of this method. To demonstrate its practical utility, we also conducted a case study to predict diurnal active/inactive patterns using accelerometry data from the National Health and Nutrition Examination Survey (NHANES) 2011-2014. Both the simulation study and the data application demonstrate the better predictive performance and high computational efficiency of the proposed method compared to existing methods. The proposed method also obtains more personalized prediction that improves as more information becomes available, which is an essential goal of dynamic prediction that other methods fail to achieve.
翻译:动态预测通常指利用历史记录预测未来结果,在生物医学研究中常受关注。对于样本量大、测量密度高、相关性结构复杂的数据集,传统方法常因数据规模和模型复杂度带来的计算负担而不可行。此外,许多模型无法直接支持广义结果的样本外预测。为解决这些问题,我们基于近期开发的指数族数据变异模式估计方法——快速广义函数主成分分析(fGFPCA),提出了一种动态预测新方法。该方法能够更高效地处理大规模高密度重复测量数据,其实现甚至在个人计算资源(如标准台式机或笔记本电脑)上亦可行。所提方法能对呈现高度非线性的数据实现高度灵活且准确的未来轨迹预测,且无需重新估计任何参数即可获得样本外预测。我们设计并实施了模拟研究以阐明该方法的优势。为展示其实用性,我们还通过2011-2014年美国国家健康与营养调查(NHANES)的加速度计数据进行了预测昼夜活动/非活动模式的案例研究。模拟研究和数据应用均表明,相较于现有方法,所提方法具有更优的预测性能和更高的计算效率。该方法还能获得更个性化的预测,且随着可用信息的增加而持续改进,这是其他方法未能实现的动态预测核心目标。