High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduces unobserved factors to account for the between-variable and across-time dependence and assist the prediction. Statistical inference and prediction tools are developed under a general setting that allows outcome variables to be of mixed types and possibly unobserved for certain time points, for example, due to right censoring. A central limit theorem is established for drawing statistical inferences on regression coefficients. Additionally, an information criterion is introduced to choose the number of factors. The proposed model is applied to customer grocery shopping records to predict and understand shopping behavior.
翻译:高维多元纵向数据——即多个结果变量随时间重复测量所得数据——在社会科学、行为科学及健康科学领域日益普遍。本文提出一种潜变量模型,用于基于高维多元纵向数据进行协变量效应的统计推断及未来结果预测。该模型通过引入不可观测因子来刻画变量间与跨时间维度的相依性,并辅助预测任务。我们在允许结果变量为混合类型且可能在特定时间点存在缺失(例如因右删失导致)的一般设定下,开发了统计推断与预测工具。针对回归系数的统计推断,建立了中心极限定理。此外,引入信息准则以确定因子数量。所提模型应用于顾客食品杂货购物记录数据,实现了购物行为的预测与解析。