On admissibility in post-hoc hypothesis testing

The validity of classical hypothesis testing requires the significance level $α$ be fixed before any statistical analysis takes place. This is a stringent requirement. For instance, it prohibits updating $α$ during (or after) an experiment due to changing concern about the cost of false positives, or to reflect unexpectedly strong evidence against the null. Perhaps most disturbingly, witnessing a p-value $p\llα$ vs $p= α- ε$ for tiny $ε> 0$ has no (statistical) relevance for any downstream decision-making. Following recent work of Grünwald (2024), we develop a theory of post-hoc hypothesis testing, enabling $α$ to be chosen after seeing and analyzing the data. To study "good" post-hoc tests we introduce $Γ$-admissibility, where $Γ$ is a set of adversaries which map the data to a significance level. We classify the set of $Γ$-admissible rules for various sets $Γ$, showing they must be based on e-values, and recover the Neyman-Pearson lemma when $Γ$ is the constant map.

翻译：经典假设检验的有效性要求显著性水平$α$在任何统计分析开始前就已固定。这是一个严格的要求。例如，它禁止在实验过程中（或之后）由于对误报成本担忧的变化，或为反映反对原假设的意外强证据而更新$α$。或许最令人不安的是，观察到p值$p\llα$与$p= α- ε$（对于微小$ε> 0$）对任何下游决策均无（统计）相关性。基于Grünwald (2024) 的最新工作，我们发展了一套事后假设检验理论，使得$α$可以在观察和分析数据后选择。为研究“良好”的事后检验，我们引入$Γ$-可容许性，其中$Γ$是一组将数据映射到显著性水平的对抗函数。我们对不同集合$Γ$下的$Γ$-可容许规则集进行分类，证明它们必须基于e值，并在$Γ$为常数映射时恢复Neyman-Pearson引理。

相关内容

假设检验

关注 8

假设检验是推论统计中用于检验统计假设的一种方法。而“统计假设”是可通过观察一组随机变量的模型进行检验的科学假说。一旦能估计未知参数，就会希望根据结果对未知的真正参数值做出适当的推论。统计上对参数的假设，就是对一个或多个参数的论述。而其中欲检验其正确性的为零假设（null hypothesis），零假设通常由研究者决定，反映研究者对未知参数的看法。相对于零假设的其他有关参数之论述是备择假设（alternative hypothesis），它通常反应了执行检定的研究者对参数可能数值的另一种（对立的）看法（换句话说，备择假设通常才是研究者最想知道的）。假设检验的种类包括：t检验，Z检验，卡方检验，F检验等等。

【伯克利博士论文】衔接示范与决策：可证明的模仿学习理论与算法

专知会员服务

12+阅读 · 2025年9月4日

【MIT博士论文】基于数据的模型可靠性视角，322页pdf

专知会员服务

39+阅读 · 2024年3月25日

【剑桥大学博士论文】模型不确定性下的统计假设检验，198页pdf

专知会员服务

26+阅读 · 2023年2月7日

《用于工业设计异常检测和参数余量预测的无监督概率和核回归方法》234页博士论文

专知会员服务

20+阅读 · 2022年5月12日