Selective inference is the problem of giving valid answers to statistical questions chosen in a data-driven manner. A standard solution to selective inference is simultaneous inference, which delivers valid answers to the set of all questions that could possibly have been asked. However, simultaneous inference can be unnecessarily conservative if this set includes many questions that were unlikely to be asked in the first place. We introduce a less conservative solution to selective inference that we call locally simultaneous inference, which only answers those questions that could plausibly have been asked in light of the observed data, all the while preserving rigorous type I error guarantees. For example, if the objective is to construct a confidence interval for the "winning" treatment effect in a clinical trial with multiple treatments, and it is obvious in hindsight that only one treatment had a chance to win, then our approach will return an interval that is nearly the same as the uncorrected, standard interval. Under mild conditions satisfied by common confidence intervals, locally simultaneous inference strictly dominates simultaneous inference, meaning it can never yield less statistical power but only more. Compared to conditional selective inference, which demands stronger guarantees, locally simultaneous inference is more easily applicable in nonparametric settings and is more numerically stable.
翻译:选择性推断是为数据驱动选择的统计问题提供有效答案的难题。同时推断作为其标准解决方案,能为所有可能被提出的统计问题提供有效答案。然而,当问题集合包含大量根本不可能被提出的问题时,同时推断可能会过于保守。我们提出一种更为宽松的选择性推断方法,称为局部同时推断,该方法仅回应依据观测数据具有合理被提出可能性的问题,同时严格保证第一类错误率控制。例如,在针对多种疗法的临床试验中构建"胜出"治疗效果置信区间时,若事后明显看出仅有一种疗法具有胜出可能,我们的方法将返回与未校正标准区间几乎一致的区间。在满足常见置信区间的温和条件下,局部同时推断严格优于同时推断——它永远不会降低统计功效,反而可能提升统计功效。与要求更严格保证的条件性选择性推断相比,局部同时推断更易应用于非参数场景,且具有更好的数值稳定性。