Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all questions that could possibly have been asked. However, simultaneous inference can be unnecessarily conservative if this set includes many questions that were unlikely to be asked in the first place. We introduce a less conservative solution to selective inference that we call locally simultaneous inference, which only answers those questions that could plausibly have been asked in light of the observed data, all the while preserving rigorous type I error guarantees. For example, if the objective is to construct a confidence interval for the "winning" treatment effect in a clinical trial with multiple treatments, and it is obvious in hindsight that only one treatment had a chance to win, then our approach will return an interval that is nearly the same as the uncorrected, standard interval. Locally simultaneous inference is implemented by refining any method for simultaneous inference of interest. Under mild conditions satisfied by common confidence intervals, locally simultaneous inference strictly dominates its underlying simultaneous inference method, meaning it can never yield less statistical power but only more. Compared to conditional selective inference, which demands stronger guarantees, locally simultaneous inference is more easily applicable in nonparametric settings and is more numerically stable.
翻译:选择性推断是指对以数据驱动方式选定的统计问题给出有效答案的难题。其标准解法是同步推断,即对所有可能被提出的问题集合给出有效答案。然而,若该集合包含许多原本不太可能被提出的问题,同步推断可能过度保守。本文提出一种更少保守性的选择性推断解法——局部同步推断,该方法仅针对观测数据中具有合理可能性的问题做出回答,同时严格保证第一类错误率。例如,在包含多种治疗的临床试验中,若目标是为"胜出"治疗方案构建置信区间,且事后回顾显然只有一种治疗方案有望胜出,则本方法将返回近乎未校正的标准区间。局部同步推断通过对任意感兴趣的同步推断方法进行精炼实现。在常见置信区间满足的温和条件下,局部同步推断严格优于其底层同步推断方法,即统计效力永不降低、只增不减。相较于要求更强保证的条件选择性推断,局部同步推断更易应用于非参数场景,且数值稳定性更高。