In many experiments and observational studies, the outcome of interest is often difficult or expensive to observe, reducing effective sample sizes for estimating average treatment effects (ATEs) even when identifiable. We study how incorporating data on units for which only surrogate outcomes not of primary interest are observed can increase the precision of ATE estimation. We refrain from imposing stringent surrogacy conditions, which permit surrogates as perfect replacements for the target outcome. Instead, we supplement the available, albeit limited, observations of the target outcome (which by themselves identify the ATE) with abundant observations of surrogate outcomes, without any assumptions beyond random assignment and missingness and corresponding overlap conditions. To quantify the potential gains, we derive the difference in efficiency bounds on ATE estimation with and without surrogates, both when an overwhelming or comparable number of units have missing outcomes. We develop robust ATE estimation and inference methods that realize these efficiency gains. We empirically demonstrate the gains by studying the long-term-earning effects of job training.
翻译:在许多实验和观测研究中,感兴趣的结果往往难以观察或观察成本高昂,这导致即便可识别平均处理效应(ATE),有效样本量也会减少。我们研究了如何利用仅观察到非主要关注的替代结局的单位数据,来提高ATE估计的精度。我们避免施加严格的替代条件(这些条件允许替代结局完美替代目标结局),而是用大量观察到的替代结局数据(无需超出随机分配、缺失性及相应重叠条件的假设)来补充可用的、尽管有限的目标结局观测数据(这些数据本身可识别ATE)。为量化潜在收益,我们推导了有/无替代结局时ATE估计效率界限的差异,其中涵盖单位结果缺失比例极大或相当两种情况。我们开发了能实现这些效率增益的稳健ATE估计与推断方法。通过研究职业培训对长期收入的影响,我们实证展示了这些增益。