The `Why' behind including `Y' in your imputation model

Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommmendation always holds and why this is sometimes true. We examine deterministic imputation (i.e., single imputation where the imputed values are treated as fixed) and stochastic imputation (i.e., single imputation with a random value or multiple imputation) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This paper aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations. We offer a better understanding of the considerations involved in imputing missing covariates and emphasize when it is necessary to include the outcome variable in the imputation model.

翻译：缺失数据是流行病学数据分析中的常见挑战，插补常被用于解决该问题。本文研究分析中使用的协变量存在缺失且需进行插补的场景。尽管已有建议主张在缺失协变量的插补模型中纳入分析模型的结果变量，但该建议是否普遍成立及其背后的原因仍不明确。我们考察了确定性插补（即单次插补中将插补值视为固定值）与随机插补（即单次插补中引入随机值或进行多重插补）方法，并评估其对估计插补协变量与结果变量之间关系的影响。我们从数学上证明，使用随机插补方法时，在插补模型中纳入结果变量不仅是一项建议，更是实现无偏估计的必要条件。此外，我们澄清了关于确定性插补模型的常见误解，并论证了为何此类模型不应包含结果变量。本文旨在弥合插补理论与实践之间的差距，通过数学推导解释常见的统计学建议。我们提供了对缺失协变量插补中所需考量因素的更深入理解，并强调了何时必须将结果变量纳入插补模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日