Continuous Testing: Unifying Tests and E-values

Testing has developed into the fundamental statistical framework for falsifying hypotheses. Unfortunately, tests are binary in nature: a test either rejects a hypothesis or not. Such binary decisions do not reflect the reality of many scientific studies, which often aim to present the evidence against a hypothesis and do not necessarily intend to establish a definitive conclusion. We propose a continuous generalization of a test, which we use to continuously measure the evidence against a hypothesis. Such a continuous test can be viewed as a continuous and non-randomized interpretation of the classical `randomized test'. This offers the benefits of a randomized test, without the downsides of external randomization. Another interpretation is as a literal measure, which measures the amount of binary tests that reject the hypothesis. Our work unifies classical testing and the recently proposed $e$-values: $e$-values bounded to $[0, 1/\alpha]$ are continuously interpreted size $\alpha$ randomized tests. Choosing $\alpha = 0$ yields the regular $e$-value, which we use to define a level 0 continuous test. Moreover, we generalize the traditional notion of power by using generalized means. This produces a framework that contains both classical Neyman-Pearson optimal testing and log-optimal $e$-values, as well as a continuum of other options. The traditional $p$-value appears as the reciprocal of a generally invalid continuous test. In an illustration in a Gaussian location model, we find that optimal continuous tests are of a beautifully simple form.

翻译：检验已发展成为证伪假设的基本统计框架。遗憾的是，检验本质上具有二元性：一个检验要么拒绝假设，要么不拒绝。这种二元决策无法反映许多科学研究的现实情况，这些研究通常旨在呈现反对假设的证据，而不一定试图得出确定性结论。我们提出了一种检验的连续推广，用于连续度量反对假设的证据强度。这种连续检验可视为经典"随机化检验"的连续化与非随机化诠释，在保留随机化检验优点的同时避免了外部随机化的缺陷。另一种理解是将其视为一种度量工具，用于衡量拒绝该假设的二元检验的数量。我们的工作统一了经典检验与近期提出的e值：限定在[0, 1/α]范围内的e值可连续解释为显著性水平α的随机化检验。取α=0即得到常规e值，我们借此定义了水平0连续检验。此外，我们通过广义均值推广了传统的势概念，构建了一个同时包含经典Neyman-Pearson最优检验与对数最优e值的理论框架，以及一系列连续过渡的中间选项。传统p值则表现为一种通常无效的连续检验的倒数。在高斯位置模型的示例中，我们发现最优连续检验具有极其简洁的优美形式。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日