A standard practice in statistical hypothesis testing is to mention the p-value alongside the accept/reject decision. We show the advantages of mentioning an e-value instead. With p-values, it is not clear how to use an extreme observation (e.g. p $\ll \alpha$) for getting better frequentist decisions. With e-values it is straightforward, since they provide Type-I risk control in a generalized Neyman-Pearson setting with the decision task (a general loss function) determined post-hoc, after observation of the data -- thereby providing a handle on `roving $\alpha$'s'. When Type-II risks are taken into consideration, the only admissible decision rules in the post-hoc setting turn out to be e-value-based. Similarly, if the loss incurred when specifying a faulty confidence interval is not fixed in advance, standard confidence intervals and distributions may fail whereas e-confidence sets and e-posteriors still provide valid risk guarantees. Sufficiently powerful e-values have by now been developed for a range of classical testing problems. We discuss the main challenges for wider development and deployment.
翻译:在统计假设检验的标准实践中,通常会在接受/拒绝决策的同时提及p值。我们展示了提及e值而非p值的优势。对于p值,如何利用极端观测结果(例如p ≪ α)来获得更优的频率派决策尚不明确。而使用e值则直接明了,因为它们在广义奈曼-皮尔森框架下能够控制第一类风险,其中决策任务(一般损失函数)可在数据观测后事后确定——从而提供了处理“游走α”的途径。当考虑第二类风险时,事后设定中唯一可接受的决策规则恰好是基于e值的。类似地,如果指定错误置信区间所产生的损失并非预先固定,标准置信区间和分布可能失效,而e置信集和e后验仍能提供有效的风险保证。目前,针对一系列经典检验问题已开发出足够强大的e值。我们讨论了更广泛开发与部署所面临的主要挑战。