Design-Based Causal Inference with Missing Outcomes: Missingness Mechanisms, Imputation-Assisted Randomization Tests, and Covariate Adjustment

Design-based causal inference is one of the most widely used frameworks for testing causal null hypotheses or inferring about causal parameters from experimental or observational data. The most significant merit of design-based causal inference is that its statistical validity only comes from the study design (e.g., randomization design) and does not require assuming any outcome-generating distributions or models. Although immune to model misspecification, design-based causal inference can still suffer from other data challenges, among which missingness in outcomes is a significant one. However, compared with model-based causal inference, outcome missingness in design-based causal inference is much less studied, largely due to the challenge that design-based causal inference does not assume any outcome distributions/models and, therefore, cannot directly adopt any existing model-based approaches for missing data. To fill this gap, we systematically study the missing outcomes problem in design-based causal inference. First, we use the potential outcomes framework to clarify the minimal assumption (concerning the outcome missingness mechanism) needed for conducting finite-population-exact randomization tests for the null effect (i.e., Fisher's sharp null) and that needed for constructing finite-population-exact confidence sets with missing outcomes. Second, we propose a general framework called ``imputation and re-imputation" for conducting finite-population-exact randomization tests in design-based causal studies with missing outcomes. Our framework can incorporate any existing outcome imputation algorithms and meanwhile guarantee finite-population-exact type-I error rate control. Third, we extend our framework to conduct covariate adjustment in an exact randomization test with missing outcomes and to construct finite-population-exact confidence sets with missing outcomes.

翻译：基于设计的因果推断是最广泛应用于从实验或观察数据检验因果零假设或推断因果参数的框架之一。其最大优势在于统计有效性仅源于研究设计（如随机化设计），无需假设任何结果生成分布或模型。尽管免疫于模型误设，基于设计的因果推断仍可能面临其他数据挑战，其中结果缺失问题尤为突出。然而与基于模型的因果推断相比，基于设计的因果推断中的结果缺失问题研究相对较少，主要挑战在于基于设计的因果推断不假设任何结果分布/模型，因此无法直接采用现有基于模型的缺失数据处理方法。为填补这一空白，我们系统研究了基于设计的因果推断中的结果缺失问题。首先，利用潜在结果框架阐明进行有限总体精确随机化检验（即Fisher精确零假设）和构建有限总体精确置信集所需的最小假设（关于结果缺失机制）。其次，提出名为"插补与再插补"的通用框架，用于在存在结果缺失的基于设计的因果研究中执行有限总体精确随机化检验。该框架可整合任意现有结果插补算法，同时保证有限总体精确的第一类错误率控制。最后，我们将该框架扩展至存在缺失结果时的协变量调整精确随机化检验，以及存在缺失结果时的有限总体精确置信集构建。