A pervasive methodological error is the post-hoc interpretation of $p$-values. A $p$-value $p$ is the smallest significance level at which we would have rejected the null had we chosen level $p$. It is not the smallest significance level at which we reject the null. We introduce post-hoc $p$-values, that do admit such a post-hoc interpretation. We show that $p$ is a post-hoc $p$-value if and only if $1/p$ is an $e$-value, a recently introduced statistical object. The product of independent post-hoc $p$-values is a post-hoc $p$-value, making them easy to combine. Moreover, any post-hoc $p$-value can be trivially improved if we permit external randomization, but only (essentially) non-randomized post-hoc $p$-values can be arbitrarily merged through multiplication. In addition, we discuss what constitutes a `good' post-hoc $p$-value. Finally, we argue that post-hoc $p$-values eliminate the need of a pre-specified significance level, such as $\alpha = .05$ or $\alpha = .005$ \citep{benjamin2018redefine}. We believe this may take away incentives for $p$-hacking and contribute to solving the file-drawer problem, as both these issues arise from using a pre-specified significance level.
翻译:一个普遍存在的方法论错误是对$p$值的事后解释。$p$值$p$是若我们选择阈值$p$时本应拒绝原假设的最小显著性水平,而非我们实际拒绝原假设的最小显著性水平。我们引入事后$p$值,使其能够真正接受此类事后解释。研究表明,$p$为事后$p$值当且仅当$1/p$是$e$值——一种新近提出的统计量。独立事后$p$值的乘积仍是事后$p$值,这极大便利了它们的合并。此外,若允许外部随机化,任何事后$p$值均可被简单优化,但只有(本质上的)非随机化事后$p$值才能通过乘法任意合并。同时,我们探讨了何为"优良"的事后$p$值。最后,我们论证事后$p$值消除了预设定显著性水平(如$\alpha = .05$或$\alpha = .005$ \citep{benjamin2018redefine})的必要性。我们认为这能削弱$p$值操纵的动机,并有助于解决抽屉文件问题,因为这两个问题均源于对预先设定显著性水平的使用。