The validity of classical hypothesis testing requires the significance level $α$ be fixed before any statistical analysis takes place. This is a stringent requirement. For instance, it prohibits updating $α$ during (or after) an experiment due to changing concern about the cost of false positives, or to reflect unexpectedly strong evidence against the null. Perhaps most disturbingly, witnessing a p-value $p\llα$ vs $p= α- ε$ for tiny $ε> 0$ has no (statistical) relevance for any downstream decision-making. Following recent work of Grünwald (2024), we develop a theory of post-hoc hypothesis testing, enabling $α$ to be chosen after seeing and analyzing the data. To study "good" post-hoc tests we introduce $Γ$-admissibility, where $Γ$ is a set of adversaries which map the data to a significance level. We classify the set of $Γ$-admissible rules for various sets $Γ$, showing they must be based on e-values, and recover the Neyman-Pearson lemma when $Γ$ is the constant map.
翻译:经典假设检验的有效性要求显著性水平$α$在任何统计分析开始前就已固定。这是一个严格的要求。例如,它禁止在实验过程中(或之后)由于对误报成本担忧的变化,或为反映反对原假设的意外强证据而更新$α$。或许最令人不安的是,观察到p值$p\llα$与$p= α- ε$(对于微小$ε> 0$)对任何下游决策均无(统计)相关性。基于Grünwald (2024) 的最新工作,我们发展了一套事后假设检验理论,使得$α$可以在观察和分析数据后选择。为研究“良好”的事后检验,我们引入$Γ$-可容许性,其中$Γ$是一组将数据映射到显著性水平的对抗函数。我们对不同集合$Γ$下的$Γ$-可容许规则集进行分类,证明它们必须基于e值,并在$Γ$为常数映射时恢复Neyman-Pearson引理。