Researchers are increasingly turning to machine learning (ML) algorithms to investigate causal heterogeneity in randomized experiments. Despite their promise, ML algorithms may fail to accurately ascertain heterogeneous treatment effects under practical settings with many covariates and small sample size. In addition, the quantification of estimation uncertainty remains a challenge. We develop a general approach to statistical inference for heterogeneous treatment effects discovered by a generic ML algorithm. We apply the Neyman's repeated sampling framework to a common setting, in which researchers use an ML algorithm to estimate the conditional average treatment effect and then divide the sample into several groups based on the magnitude of the estimated effects. We show how to estimate the average treatment effect within each of these groups, and construct a valid confidence interval. In addition, we develop nonparametric tests of treatment effect homogeneity across groups, and rank-consistency of within-group average treatment effects. The validity of our methodology does not rely on the properties of ML algorithms because it is solely based on the randomization of treatment assignment and random sampling of units. Finally, we generalize our methodology to the cross-fitting procedure by accounting for the additional uncertainty induced by the random splitting of data.
翻译:研究者们日益倾向于运用机器学习算法来探究随机实验中的因果异质性。尽管这些算法具有潜力,但在实际应用中,当面临众多协变量且样本量较小的情况时,机器学习算法可能无法准确判定异质性处理效应。此外,估计不确定性的量化仍是一个挑战。我们提出了一种通用方法,用于对通用机器学习算法发现的异质性处理效应进行统计推断。我们将内曼重复抽样框架应用于一个常见场景:研究者使用机器学习算法估计条件平均处理效应,然后根据估计效应的大小将样本分为若干组。我们展示了如何估计每组内的平均处理效应,并构建有效的置信区间。此外,我们还开发了跨组处理效应同质性的非参数检验,以及组内平均处理效应的秩一致性检验。我们方法的有效性不依赖于机器学习算法的性质,因为它完全基于处理分配的随机性和单元的随机抽样。最后,我们将该方法推广至交叉拟合程序,并考虑了数据随机分割所引入的额外不确定性。