Testing exchangeability by pairwise betting

In this paper, we address the problem of testing exchangeability of a sequence of random variables, $X_1, X_2,\cdots$. This problem has been studied under the recently popular framework of testing by betting. But the mapping of testing problems to game is not one to one: many games can be designed for the same test. Past work established that it is futile to play single game betting on every observation: test martingales in the data filtration are powerless. Two avenues have been explored to circumvent this impossibility: betting in a reduced filtration (wealth is a test martingale in a coarsened filtration), or playing many games in parallel (wealth is an e-process in the data filtration). The former has proved to be difficult to theoretically analyze, while the latter only works for binary or discrete observation spaces. Here, we introduce a different approach that circumvents both drawbacks. We design a new (yet simple) game in which we observe the data sequence in pairs. Despite the fact that betting on individual observations is futile, we show that betting on pairs of observations is not. To elaborate, we prove that our game leads to a nontrivial test martingale, which is interesting because it has been obtained by shrinking the filtration very slightly. We show that our test controls type-1 error despite continuous monitoring, and achieves power one for both binary and continuous observations, under a broad class of alternatives. Due to the shrunk filtration, optional stopping is only allowed at even stopping times, not at odd ones: a relatively minor price. We provide a wide array of simulations that align with our theoretical findings.

翻译：本文研究随机变量序列 $X_1, X_2,\cdots$ 的可交换性检验问题。该问题在近期流行的“以投注检验”框架下已有研究，但检验问题到博弈的映射并非一一对应：同一检验可设计多种博弈。已有工作表明，对每个观测值进行单次博弈投注是徒劳的：数据过滤中的检验鞅（test martingale）不具备检验能力。为规避这一不可能性，学界探索了两条途径：在约化过滤中投注（财富为粗化过滤中的检验鞅），或在数据过滤中并行进行多次博弈（财富为数据过滤中的e过程）。前者在理论分析上存在困难，后者仅适用于二元或离散观测空间。本文提出一种新方法，同时规避了上述两种缺陷。我们设计了一种新的（且简单的）博弈方式：成对观测数据序列。尽管对单个观测值投注无效，但我们证明成对观测值的投注是可行的。详细而言，我们证明了该博弈能生成非平凡检验鞅——其有趣之处在于仅通过轻微收缩过滤即可获得。我们证明该检验在连续监测下能控制第一类错误，并在广泛备择假设下对二元与连续观测均能达到检验功效1。由于过滤收缩，可选停止仅允许在偶数停止时间而非奇数停止时间进行——这是相对较小的代价。我们提供了大量仿真实验，其结果与理论发现完全吻合。