Longitudinal studies are frequently used in medical research and involve collecting repeated measures on individuals over time. Observations from the same individual are invariably correlated and thus an analytic approach that accounts for this clustering by individual is required. While almost all research suffers from missing data, this can be particularly problematic in longitudinal studies as participation often becomes harder to maintain over time. Multiple imputation (MI) is widely used to handle missing data in such studies. When using MI, it is important that the imputation model is compatible with the proposed analysis model. In a longitudinal analysis, this implies that the clustering considered in the analysis model should be reflected in the imputation process. Several MI approaches have been proposed to impute incomplete longitudinal data, such as treating repeated measurements of the same variable as distinct variables or using generalized linear mixed imputation models. However, the uptake of these methods has been limited, as they require additional data manipulation and use of advanced imputation procedures. In this tutorial, we review the available MI approaches that can be used for handling incomplete longitudinal data, including where individuals are clustered within higher-level clusters. We illustrate implementation with replicable R and Stata code using a case study from the Childhood to Adolescence Transition Study.
翻译:纵向研究在医学研究中广泛应用,其核心在于对同一研究对象随时间推移进行重复测量。由于同一对象的观测数据必然存在相关性,因此需要采用能够处理个体间聚集性的分析方法。尽管几乎所有研究都面临数据缺失问题,但纵向研究中随着时间推移,参与者依从性往往更难维持,使得缺失数据问题尤为突出。多重插补(MI)是处理此类研究缺失数据的常用方法。应用MI时,需确保插补模型与预设分析模型兼容。对于纵向分析,这意味着分析模型中考虑的聚集性应在插补过程中得到体现。目前已提出多种处理纵向数据缺失的MI方法,例如将同一变量的重复测量视为独立变量,或采用广义线性混合插补模型。然而,由于这些方法需要额外数据预处理及复杂插补程序,其应用受限。本操作指南系统回顾了可用于处理纵向数据缺失的MI方法,涵盖个体嵌套于更高层级聚集单位的情况,并通过"儿童至青少年过渡研究"案例,提供了可复现的R语言和Stata代码实现范例。