Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all questions that could possibly have been asked. However, simultaneous inference can be unnecessarily conservative if this set includes many questions that were unlikely to be asked in the first place. We introduce a less conservative solution to selective inference that we call locally simultaneous inference, which only answers those questions that could plausibly have been asked in light of the observed data, all the while preserving rigorous type I error guarantees. For example, if the objective is to construct a confidence interval for the "winning" treatment effect in a clinical trial with multiple treatments, and it is obvious in hindsight that only one treatment had a chance to win, then our approach will return an interval that is nearly the same as the uncorrected, standard interval. Locally simultaneous inference is implemented by refining any method for simultaneous inference of interest. Under mild conditions satisfied by common confidence intervals, locally simultaneous inference strictly dominates its underlying simultaneous inference method, meaning it can never yield less statistical power but only more. Compared to conditional selective inference, which demands stronger guarantees, locally simultaneous inference is more easily applicable in nonparametric settings and is more numerically stable.
翻译:选择性推断是指在数据驱动方式下,对以数据为根据选定的统计问题给出有效答案的难题。同时推断作为其标准解法,能为所有可能被提出的问题集合提供有效答案。然而,当该集合包含大量实际上不太可能被提出的问题时,同时推断可能过于保守。我们提出一种名为"局部同时推断"的保守性较低的选择性推断方法,该方法仅回答基于观测数据有合理可能被提出的问题,同时保持严格的I类错误控制保证。例如,若目标是为包含多个治疗组的临床试验中"胜出"的治疗效应构建置信区间,且事后明显只有一种治疗组有胜出可能,则我们的方法将返回与未校正标准区间几乎相同的置信区间。局部同时推断通过改进任意感兴趣的同时推断方法来实现。在常见置信区间满足的温和条件下,局部同时推断严格优于其基础同时推断方法,即其统计效能不会降低而只会提升。与需要更强保证的条件选择性推断相比,局部同时推断更易应用于非参数场景且数值稳定性更优。