The `Why' behind including `Y' in your imputation model

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e., single imputation with fixed values) and stochastic imputation (i.e., single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This paper aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations. We offer a better understanding of the considerations involved in imputing missing covariates and emphasize when it is necessary to include the outcome variable in the imputation model.

翻译：缺失数据是分析流行病学数据时的常见挑战，插补常被用于解决此问题。本文研究了分析中使用的协变量存在缺失并进行插补的情况。现有建议主张将分析模型中的结局变量纳入缺失协变量的插补模型，但该建议是否始终成立及其原因尚不明确。我们考察了确定性插补（即基于固定值的单一插补）和随机性插补（即基于随机值的单一或多次插补）方法，及其对估计插补协变量与结局变量之间关系的影响。通过数学论证表明，在使用随机性插补方法时，将结局变量纳入插补模型不仅是建议，更是实现无偏结果的必要条件。此外，我们澄清了关于确定性插补模型的常见误解，并论证了为何不应在此类模型中包含结局变量。本文旨在弥合插补理论与实践的差距，通过数学推导解释常见的统计建议，帮助深入理解缺失协变量插补中的考量因素，并强调何时必须将结局变量纳入插补模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日