In a Monte-Carlo test, the observed dataset is fixed, and several resampled or permuted versions of the dataset are generated in order to test a null hypothesis that the original dataset is exchangeable with the resampled/permuted ones. Sequential Monte-Carlo tests aim to save computational resources by generating these additional datasets sequentially one by one, and potentially stopping early. While earlier tests yield valid inference at a particular prespecified stopping rule, our work develops a new anytime-valid Monte-Carlo test that can be continuously monitored, yielding a p-value or e-value at any stopping time possibly not specified in advance. Despite the added flexibility, it significantly outperforms the well-known method by Besag and Clifford, stopping earlier under both the null and the alternative without compromising power. The core technical advance is the development of new test martingales (nonnegative martingales with initial value one) for testing exchangeability against a very particular alternative. These test martingales are constructed using new and simple betting strategies that smartly bet on the relative ranks of generated test statistics. The betting strategies are guided by the derivation of a simple log-optimal betting strategy, have closed form expressions for the wealth process, provable guarantees on resampling risk, and display excellent power in practice.
翻译:在蒙特卡洛检验中,观测数据集固定不变,通过生成该数据集的多个重采样或置换版本,检验原假设(即原始数据集与重采样/置换版本可交换)。序贯蒙特卡洛检验旨在通过逐个依次生成这些额外数据集并可能提前终止,从而节省计算资源。尽管早期检验方法在特定预设停止规则下能提供有效推断,但本文开发了一种新的任意时间有效蒙特卡洛检验,可连续监测,在任意(可能未预先指定)的停止时间输出p值或e值。尽管增加了灵活性,该方法显著优于Besag和Clifford的经典方法——在原假设和备择假设下均更早停止,且不牺牲统计功效。核心技术突破在于,针对特定备择假设检验可交换性,构造了新型检验鞅(初始值为1的非负鞅)。这些检验鞅通过新颖简洁的投注策略构建,智能地对生成的检验统计量的相对秩进行投注。投注策略由简单对数最优投注策略推导得出,财富过程具有闭式表达式,提供可证明的重采样风险保证,并在实践中展现出卓越的统计功效。