Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data

from arxiv, Forthcoming in Journal of Causal Inference. Replication materials at https://osf.io/d9ujq/ . Results differ very slightly from previous versions due to changes made in the process of making the analysis replicable. For details, compare https://github.com/adamSales/rebarLoop/tree/ReplicateArxiv2-2023 (previous version) to https://github.com/adamSales/rebarLoop/tree/docker (current version)

Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of randomization largely justifies the statistical assumptions made. However, RCT sample sizes are often small, leading to low precision; in many cases RCT estimates may be too imprecise to guide policy or inform science. Observational studies, by contrast, have strengths and weaknesses complementary to those of RCTs. Observational studies typically offer much larger sample sizes, but may suffer confounding. In many contexts, experimental and observational data exist side by side, allowing the possibility of integrating "big observational data" with "small but high-quality experimental data" to get the best of both. Such approaches hold particular promise in the field of education, where RCT sample sizes are often small due to cost constraints, but automatic collection of observational data, such as in computerized educational technology applications, or in state longitudinal data systems (SLDS) with administrative data on hundreds of thousand of students, has made rich, high-dimensional observational data widely available. We outline an approach that allows one to employ machine learning algorithms to learn from the observational data, and use the resulting models to improve precision in randomized experiments. Importantly, there is no requirement that the machine learning models are "correct" in any sense, and the final experimental results are guaranteed to be exactly unbiased. Thus, there is no danger of confounding biases in the observational data leaking into the experiment.

翻译：随机对照试验（RCT）在教育研究中日益普及，常被视为因果推断的黄金标准。随机实验的两大核心优势在于：（1）不受混杂因素影响，从而能对干预的因果效应进行无偏估计；（2）支持基于设计的推断，即随机化的物理过程很大程度上验证了所采用的统计假设。然而，RCT样本量通常较小，导致估计精度不足；在许多情况下，RCT的估计可能过于粗糙，难以指导政策制定或科学发现。相比之下，观测性研究的优势与劣势恰好与RCT互补：观测性研究通常拥有更大的样本量，但可能受到混杂因素干扰。在许多场景中，实验数据与观测数据并存，这为整合"大样本观测数据"与"小样本高质量实验数据"提供了可能，以期兼得两者之长。这类方法在教育领域尤其具有前景——受成本限制，RCT的样本量往往较小，但通过计算机化教育技术应用或覆盖数十万学生的州纵向数据系统（SLDS）等行政数据自动采集的观测数据，已使丰富的高维观测数据广泛可得。本文提出一种方法：通过机器学习算法从观测数据中学习，并利用所得模型提升随机实验的估计精度。关键在于，该方法无需机器学习模型具有任何意义上的"正确性"，且最终实验结果保证严格无偏。因此，观测数据中的混杂偏倚不会渗入实验估计。