In many scientific experiments, the data annotating cost constraints the pace for testing novel hypotheses. Yet, modern machine learning pipelines offer a promising solution, provided their predictions yield correct conclusions. We focus on Prediction-Powered Causal Inferences (PPCI), i.e., estimating the treatment effect in an unlabeled target experiment, relying on training data with the same outcome annotated but potentially different treatment or effect modifiers. We first show that conditional calibration guarantees valid PPCI at population level. Then, we introduce a sufficient representation constraint transferring validity across experiments, which we propose to enforce in practice in Deconfounded Empirical Risk Minimization, our new model-agnostic training objective. We validate our method on synthetic and real-world scientific data, solving impossible problem instances for Empirical Risk Minimization even with standard invariance constraints. In particular, for the first time, we achieve valid causal inference on a scientific experiment with complex recording and no human annotations, fine-tuning a foundational model on our similar annotated experiment.
翻译:在许多科学实验中,数据标注成本限制了检验新假设的进度。然而,现代机器学习流程提供了一个有前景的解决方案,前提是它们的预测能够得出正确的结论。我们聚焦于预测驱动的因果推断(PPCI),即在未标注的目标实验中估计处理效应,依赖于具有相同标注结果但处理或效应修饰变量可能不同的训练数据。我们首先证明条件校准保证了群体层面PPCI的有效性。随后,我们引入了一种在实验间传递有效性的充分表征约束,并提出在实践中通过去混淆经验风险最小化——我们提出的新型模型无关训练目标——来实施该约束。我们在合成与真实世界科学数据上验证了所提方法,解决了即使施加标准不变性约束时经验风险最小化也无法处理的问题实例。特别地,我们首次在具有复杂记录且无人为标注的科学实验上实现了有效的因果推断,方法是在我们类似的标注实验上对基础模型进行微调。