In a Monte-Carlo test, the observed dataset is fixed, and several resampled or permuted versions of the dataset are generated in order to test a null hypothesis that the original dataset is exchangeable with the resampled/permuted ones. Sequential Monte-Carlo tests aim to save computational resources by generating these additional datasets sequentially one by one, and potentially stopping early. While earlier tests yield valid inference at a particular prespecified stopping rule, our work develops a new anytime-valid Monte-Carlo test that can be continuously monitored, yielding a p-value or e-value at any stopping time possibly not specified in advance. It generalizes the well-known method by Besag and Clifford, allowing it to stop at any time, but also encompasses new sequential Monte-Carlo tests that tend to stop sooner under the null and alternative without compromising power. The core technical advance is the development of new test martingales (nonnegative martingales with initial value one) for testing exchangeability against a very particular alternative. These test martingales are constructed using new and simple betting strategies that smartly bet on whether a generated test statistic is greater or smaller than the observed one. The betting strategies are guided by the derivation of a simple log-optimal betting strategy, have closed form expressions for the wealth process, provable guarantees on resampling risk, and display excellent power in practice.
翻译:在蒙特卡洛检验中,观测数据集是固定的,通过生成多个重采样或置换版本的数据集来检验原假设——即原始数据集与重采样/置换数据集是可交换的。序贯蒙特卡洛检验旨在通过逐个顺序生成这些额外数据集来节省计算资源,并可能提前停止检验。虽然早期检验方法能在特定预设停止规则下给出有效推断,但本研究提出了一种新的任意时间有效的蒙特卡洛检验方法,该方法可被持续监测,并在可能未预先指定的任意停止时间给出p值或e值。该方法推广了著名的Besag和Clifford方法,允许其在任意时间停止,同时还涵盖了一系列新的序贯蒙特卡洛检验方法,这些方法在零假设和备择假设下都能在不损失功效的前提下更早停止检验。核心技术进步在于构建了新的检验鞅(初始值为一的非负鞅),用于检验可交换性对抗一个特定的备择假设。这些检验鞅通过新颖而简单的投注策略构建,该策略智能地押注于生成的检验统计量是否大于或小于观测值。投注策略通过推导简单的对数最优投注策略进行指导,其财富过程具有闭式表达式,对重采样风险具有可证明的保证,并在实际应用中展现出优异的检验功效。