Out of the participants in a randomized experiment with anticipated heterogeneous treatment effects, is it possible to identify which subjects have a positive treatment effect? While subgroup analysis has received attention, claims about individual participants are much more challenging. We frame the problem in terms of multiple hypothesis testing: each individual has a null hypothesis (stating that the potential outcomes are equal, for example) and we aim to identify those for whom the null is false (the treatment potential outcome stochastically dominates the control one, for example). We develop a novel algorithm that identifies such a subset, with nonasymptotic control of the false discovery rate (FDR). Our algorithm allows for interaction -- a human data scientist (or a computer program) may adaptively guide the algorithm in a data-dependent manner to gain power. We show how to extend the methods to observational settings and achieve a type of doubly-robust FDR control. We also propose several extensions: (a) relaxing the null to nonpositive effects, (b) moving from unpaired to paired samples, and (c) subgroup identification. We demonstrate via numerical experiments and theoretical analysis that the proposed method has valid FDR control in finite samples and reasonably high identification power.
翻译:在预期存在异质性处理效应的随机实验中,能否识别出哪些受试者具有正处理效应?尽管子组分析已受到关注,但关于个体参与者的论断更具挑战性。我们将该问题框架化为多重假设检验:每个个体都有一个原假设(例如,声称潜在结果相等),我们的目标是识别其中原假设为假的个体(例如,处理潜在结果在随机意义上优于对照潜在结果)。我们开发了一种新颖算法,可在非渐近条件下控制错误发现率(FDR)的同时识别此类子集。该算法允许交互——人类数据科学家(或计算机程序)能以数据依赖的方式自适应引导算法以提升统计功效。我们展示了如何将这些方法拓展至观测研究场景,并实现一种双重稳健的FDR控制。此外,我们提出若干扩展方案:(a) 将原假设放松为非正效应,(b) 从非配对样本转向配对样本,以及 (c) 子组识别。通过数值实验与理论分析,我们证明所提出方法在有限样本中具有有效的FDR控制能力与较高的识别功效。