We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven (sub)collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We provide a comprehensive theoretical framework, along with insights, and delve into several case studies to illustrate instances where a shift to a non-selective or unconditional perspective can yield a power gain.
翻译:本文研究一类通过条件化选择事件进行选择性推断的方法。这类方法遵循两阶段流程:首先从某个庞大的假设宇宙中选取由数据驱动的假设(子)集合;随后,基于用于选择的信息,在该数据驱动集合内进行条件化推断。此类方法的实例包括基础数据分割、现代数据雕刻方法,以及基于多面体引理的LASSO系数后选择推断方法。本文采用整体性视角审视此类方法,将选择、条件化及最终错误控制步骤视为统一方法。从这一视角出发,我们证明:直接在完整假设宇宙上定义的多重检验方法,其统计功效始终不低于基于选择与条件化的选择性推断方法——即便在假设宇宙可能无限大且仅被隐含定义(如数据分割情形)时,该结论依然成立。我们构建了完备的理论框架并提供深入洞见,通过多项案例研究阐明:转向非选择性或无条件化视角如何能带来统计功效的提升。