Missing data is a common problem that challenges the study of effects of treatments. In the context of mediation analysis, this paper addresses missingness in the two key variables, mediator and outcome, focusing on identification. We consider self-separated missingness models where identification is achieved by conditional independence assumptions only and self-connected missingness models where identification relies on so-called shadow variables. The first class is somewhat limited as it is constrained by the need to remove a certain number of connections from the model. The second class turns out to include substantial variation in the position of the shadow variable in the causal structure (vis-a-vis the mediator and outcome) and the corresponding implications for the model. In constructing the models, to improve plausibility, we pay close attention to allowing, where possible, dependencies due to unobserved causes of the missingness. In this exploration, we develop theory where needed. This results in templates for identification in this mediation setting, generally useful identification techniques, and perhaps most significantly, synthesis and substantial expansion of shadow variable theory.
翻译:缺失数据是影响治疗效果研究的常见难题。在中介分析背景下,本文针对中介变量与结果变量这两个关键变量的缺失问题,着重探讨其识别方法。我们提出自分离缺失模型(仅通过条件独立性假设实现识别)与自连接缺失模型(识别依赖于所谓的影子变量)。第一类模型存在一定局限性,因其需要从模型中移除特定数量的连接关系。第二类模型则展现出显著多样性,具体表现为影子变量在因果结构中的位置差异(相对于中介变量与结果变量)及其对模型的不同影响。在模型构建过程中,为提升合理性,我们特别注重在可能情况下允许由缺失机制的未观测原因产生的依赖关系。在此研究过程中,我们发展了必要的理论框架,最终形成了适用于中介场景的识别模板、具有普适性的识别技术,以及最重要的——对影子变量理论进行了系统整合与实质性拓展。