Missing data is a common challenge in studying treatment effects. In the context of mediation analysis, this paper addresses missingness in the mediator and outcome, focusing on identification. We first consider self-separated missingness models where identification is achieved by conditional independence assumptions. This model class is somewhat limited as it is constrained by the need to remove a certain number of connections from the model. We then turn to self-connected missingness models where identification relies on information from shadow variables. This model class turns out to contain substantial variation, allowing models with built-in shadow variables (mediator, outcome or covariates) and models with auxiliary shadow variables at different positions in the causal structure. To improve the practical value of the missingness mechanisms, we allow where possible for dependencies due to unobserved causes of the missingness, a feature often neglected. In this exploration, we review existing models, connect to new models, and develop theory where needed. This results in templates for identification in the mediation setting, generally useful identification techniques, and perhaps most importantly a synthesis and substantial extension of shadow variable theory. Two examples relate the models to practical considerations.
翻译:缺失数据是研究处理效应中的常见挑战。在中介分析背景下,本文聚焦中介变量与结局变量的缺失问题,着重探讨其可识别性。我们首先考虑自分离缺失模型——该类模型通过条件独立性假设实现参数识别,但由于需移除模型中特定数量的连接关系而存在一定局限性。进而转向自连接缺失模型——该类模型依赖影子变量提供识别信息。研究发现,自连接模型包含丰富的变体形式,既能涵盖内建影子变量(包括中介变量、结局变量或协变量)的模型,也能容纳辅助影子变量位于因果结构中不同位置的模型。为提升缺失机制的实际应用价值,我们在可行范围内纳入了由未观测缺失原因导致的依赖性——这一特征常被现有研究忽视。在此探索过程中,我们系统回顾既有模型,建立与新型模型的关联,并在必要时发展相关理论。最终形成了适用于中介分析场景的识别模板、具备通用价值的识别技术,更重要的是实现了对影子变量理论的综合梳理与实质性扩展。通过两个案例将模型与实际应用场景相关联。