In the recent literature on estimating heterogeneous treatment effects, each proposed method makes its own set of restrictive assumptions about the intervention's effects and which subpopulations to explicitly estimate. Moreover, the majority of the literature provides no mechanism to identify which subpopulations are the most affected--beyond manual inspection--and provides little guarantee on the correctness of the identified subpopulations. Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for discovering which subpopulation in a randomized experiment is most significantly affected by a treatment. We frame this challenge as a pattern detection problem where we efficiently maximize a nonparametric scan statistic (a measure of the conditional quantile treatment effect) over subpopulations. Furthermore, we identify the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the intervention's effects or the underlying data generating process. In addition to the algorithm, we demonstrate that under the sharp null hypothesis of no treatment effect, the asymptotic Type I and II error can be controlled, and provide sufficient conditions for detection consistency--i.e., exact identification of the affected subpopulation. Finally, we validate the efficacy of the method by discovering heterogeneous treatment effects in simulations and in real-world data from a well-known program evaluation study.
翻译:在异质性处理效应估计的最新文献中,每种方法都对干预效果和需显式估计的子群体做出各自的一组限制性假设。此外,大多数文献除了人工检查之外,没有提供识别哪些子群体受影响最大的机制,并且对已识别子群体的正确性几乎没有保证。因此,我们提出处理效应子集扫描(TESS)这一新方法,用于发现随机实验中受处理影响最显著的子群体。我们将这一挑战构建为模式检测问题,通过子群体高效最大化非参数扫描统计量(一种条件分位数处理效应的度量)。此外,我们识别出因干预而导致分布变化最大的子群体,同时对干预效果或底层数据生成过程做出最小假设。除了算法之外,我们还证明,在处理效应为零的严格原假设下,可以控制渐近第一类和第二类错误,并为检测一致性(即准确识别受影响子群体)提供充分条件。最后,我们通过模拟实验和一项著名项目评估研究的真实数据发现异质性处理效应,验证了该方法的有效性。