Measuring Evidence with a Continuous Test

Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence against a hypothesis and do not necessarily intend to establish a definitive conclusion. To solve this, we propose the continuous generalization of a test, which we use to measure the evidence against a hypothesis. Such a continuous test can be viewed as a continuous non-randomized interpretation of the classical 'randomized test'. This offers the benefits of a randomized test, without the downsides of external randomization. Another interpretation is as a literal measure, which measures the amount of binary tests that reject the hypothesis. Our work completes the bridge between classical tests and the recently proposed $e$-values: $e$-values bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$ randomized tests. Taking $\alpha$ to 0 yields the regular $e$-value: a 'level 0' continuous test. Moreover, we generalize the traditional notion of power by using generalized means. This produces a unified framework that contains both classical Neyman-Pearson optimal testing and log-optimal $e$-values, as well as a continuum of other options. The traditional $p$-value appears as the reciprocal of an $e$-value, that satisfies a weaker error bound. In an illustration in a Gaussian location model, we find that optimal continuous tests are of a beautifully simple form.

翻译：检验已发展成为证伪假设的基本统计框架。遗憾的是，检验本质上是二元的：一个检验要么拒绝假设，要么不拒绝。这种二元决策无法反映许多科学研究的现实，这些研究通常旨在呈现反对假设的证据，而未必意图确立确定性结论。为解决此问题，我们提出检验的连续推广，用以度量反对假设的证据强度。此类连续检验可视为经典"随机化检验"的连续非随机化解释，它具备随机化检验的优势，同时避免了外部随机化的缺陷。另一种解释是将其视为一种度量工具，用于衡量拒绝该假设的二元检验的数量。我们的工作完善了经典检验与近期提出的$e$值之间的桥梁：约束于$[0, 1/\alpha]$的$e$值可连续解释为尺寸$\alpha$的随机化检验。令$\alpha$趋近于0则得到常规$e$值：一种"水平0"连续检验。此外，我们通过使用广义均值推广了传统功效概念，从而构建了一个统一框架，该框架既包含经典的Neyman-Pearson最优检验与对数最优$e$值，也涵盖一系列连续过渡的其他选项。传统$p$值表现为满足较弱误差界的$e$值的倒数。在高斯位置模型的示例中，我们发现最优连续检验具有优美简洁的形式。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日