Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals, called exceptional responders, who are likely to be helped by a treatment the most. A common approach consists of two steps. One first estimates the conditional average treatment effect or its proxy using an ML algorithm. They then determine the cutoff of the resulting treatment prioritization score to select those predicted to benefit most from the treatment. Unfortunately, these estimated treatment prioritization scores are often biased and noisy. Furthermore, utilizing the same data to both choose a cutoff value and estimate the average treatment effect among the selected individuals suffer from a multiple testing problem. To address these challenges, we develop a uniform confidence band for experimentally evaluating the sorted average treatment effect (GATES) among the individuals whose treatment prioritization score is at least as high as any given quantile value, regardless of how the quantile is chosen. This provides a statistical guarantee that the GATES for the selected subgroup exceeds a certain threshold. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units without requiring modeling assumptions or resampling methods. This widens its applicability including a wide range of other causal quantities. A simulation study shows that the empirical coverage of the proposed uniform confidence bands is close to the nominal coverage when the sample is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders with a statistical performance guarantee.
翻译:在众多学科领域,研究者常利用机器学习算法识别“超应答者”亚组,即预期从治疗中获益最大的个体。常用方法分为两步:首先通过机器学习算法估计条件平均处理效应或其代理指标,然后确定所得治疗优先评分的最佳截断值,以筛选预测获益最大的患者。然而,这类估计的治疗优先评分常存在偏倚与噪声干扰。更严重的是,采用同一数据同时选择截断值和估算选定个体平均处理效应,会引发多重检验问题。为解决这些挑战,我们开发了均匀置信带,用于对治疗优先评分不低于任意给定分位数的个体群体进行实验性评估排序平均处理效应(GATES),且不受分位数选择方式影响。该方法为选定亚组的GATES超过特定阈值提供了统计保证。该方法的有效性仅依赖于治疗随机分配和单位随机抽样,无需建模假设或重抽样方法,因此可广泛适用于各类因果量值。模拟研究表明,当样本量低至100时,所提均匀置信带的经验覆盖率仍接近名义覆盖率。通过对晚期前列腺癌临床试验数据的分析,我们发现了具有统计性能保证的超高比例超应答群体。