The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a simple definition for the External Validity (EV) of interventions and counterfactuals. The definition leads to EV statistics for individual counterfactuals, and to non-parametric effect estimators for sets of counterfactuals (i.e., for samples). We use this definition to discuss several issues that have baffled the original counterfactual formulation: out-of-sample validity, reliance on independence assumptions or estimation, concurrent estimation of multiple effects and full-models, bias-variance tradeoffs, statistical power, omitted variables, and connections to current predictive and explaining techniques. Methodologically, the definition also allows us to replace the parametric, and generally ill-posed, estimation problems that followed the counterfactual definition by combinatorial enumeration problems in non-experimental samples. We use this framework to generalize popular supervised, explaining, and causal-effect estimators, improving their performance across three dimensions (External Validity, Unconfoundness and Accuracy), and enabling their use in non-i.i.d. samples. We demonstrate gains in out-of-sample prediction, intervention effect prediction, and causal effect estimation tasks. The COVID19 pandemic highlighted the need for learning solutions to provide general predictions in small samples - many times with missing variables. We also demonstrate applications in this pressing problem.
翻译:广泛使用的因果效应“反事实”定义旨在确保无偏性和准确性,而非泛化能力。我们提出一个关于干预与反事实的外部有效性(EV)的简洁定义。该定义可导出单个反事实的EV统计量,以及反事实集合(即样本)的非参数效应估计量。我们利用这一定义探讨了原始反事实框架中若干争议性问题:样本外有效性、对独立性假设或估计的依赖、多重效应与全模型的联合估计、偏差-方差权衡、统计功效、遗漏变量,以及与当前预测和解释技术的关联。方法上,该定义将传统反事实框架下参数化且通常不适定的估计问题,转化为非实验样本中的组合枚举问题。基于此框架,我们推广了流行的监督学习、解释性模型及因果效应估计器,从三个维度(外部有效性、无混杂性和准确性)提升其性能,并使其适用于非独立同分布样本。我们展示了在样本外预测、干预效应预测及因果效应估计任务中的性能提升。COVID-19疫情凸显了学习系统需在少量样本(常存在变量缺失)中提供泛化预测的需求,我们也展示了该方法在这一紧迫问题中的应用。