In a Monte-Carlo test, the observed dataset is fixed, and several resampled or permuted versions of the dataset are generated in order to test a null hypothesis that the original dataset is exchangeable with the resampled/permuted ones. Sequential Monte-Carlo tests aim to save computational resources by generating these additional datasets sequentially one by one, and potentially stopping early. While earlier tests yield valid inference at a particular prespecified stopping rule, our work develops a new anytime-valid Monte-Carlo test that can be continuously monitored, yielding a p-value or e-value at any stopping time possibly not specified in advance. Despite the added flexibility, it significantly outperforms the well-known method by Besag and Clifford, stopping earlier under both the null and the alternative without compromising power. The core technical advance is the development of new test martingales (nonnegative martingales with initial value one) for testing exchangeability against a very particular alternative. These test martingales are constructed using new and simple betting strategies that smartly bet on the relative ranks of generated test statistics. The betting strategies are guided by the derivation of a simple log-optimal betting strategy, have closed form expressions for the wealth process, provable guarantees on resampling risk, and display excellent power in practice.
翻译:在蒙特卡洛检验中,观测数据集是固定的,通过生成该数据集的若干重采样或置换版本,检验原假设(即原始数据集与重采样/置换版本可交换)。序列蒙特卡洛检验旨在通过逐一顺序生成这些额外数据集并可能提前终止,从而节省计算资源。尽管早期检验能在预设的特定停止规则下给出有效推断,本文提出了一种新的任意时刻有效的蒙特卡洛检验,可在持续监测过程中,于任意(可能未事先指定的)停止时刻提供p值或e值。尽管增加了灵活性,该方法在功效不受损的前提下,显著优于Besag和Clifford的著名方法,在原假设和备择假设下均能更早停止。核心技术突破在于:针对特定备择假设下的可交换性检验,开发了新的检验鞅(初始值为1的非负鞅)。这些检验鞅通过新颖且简单的赌注策略构建,该策略智能地对生成的检验统计量的相对秩次进行投注。赌注策略由简单对数最优赌注策略的推导所引导,具有财富过程的闭式表达式、重采样风险的可证明担保,并在实践中展现出卓越的功效。