Researchers often run resource-intensive randomized controlled trials (RCTs) to estimate the causal effects of interventions on outcomes of interest. Yet these outcomes are often noisy, and estimated overall effects can be small or imprecise. Nevertheless, we may still be able to produce reliable evidence of the efficacy of an intervention by finding subgroups with significant effects. In this paper, we propose a machine-learning method that is specifically optimized for finding such subgroups in noisy data. Unlike available methods for personalized treatment assignment, our tool is fundamentally designed to take significance testing into account: it produces a subgroup that is chosen to maximize the probability of obtaining a statistically significant positive treatment effect. We provide a computationally efficient implementation using decision trees and demonstrate its gain over selecting subgroups based on positive (estimated) treatment effects. Compared to standard tree-based regression and classification tools, this approach tends to yield higher power in detecting subgroups affected by the treatment.
翻译:研究人员通常通过资源密集型的随机对照试验来评估干预措施对关注结局的因果效应。然而这些结局往往带有噪声,且估计的平均效应可能很小或不够精确。尽管如此,通过发现具有显著效应的子群,我们仍有可能为干预措施的有效性提供可靠证据。本文提出一种针对噪声数据中此类子群发现而优化的机器学习方法。与现有用于个性化治疗分配的方法不同,该工具的核心设计充分考虑了显著性检验:它通过选择子群来最大化获得统计显著正向治疗效果的概率。我们利用决策树实现了计算高效的算法,并证明了其在发现基于正向估计治疗效果选择子群时的优越性。相较于标准的树形回归与分类工具,本方法在检测受治疗影响的子群时具有更高的统计效能。