Improving the (approximate) sequential probability ratio test by avoiding overshoot

The sequential probability ratio test (SPRT) by Wald (1945) is a cornerstone of sequential analysis. Based on desired type-I, II error levels $\alpha, \beta \in (0,1)$, it stops when the likelihood ratio statistic crosses certain upper and lower thresholds, guaranteeing optimality of the expected sample size. However, these thresholds are not closed form and the test is often applied with approximate thresholds $(1-\beta)/\alpha$ and $\beta/(1-\alpha)$ (approximate SPRT). When $\beta > 0$, this neither guarantees type I,II error control at $\alpha,\beta$ nor optimality. When $\beta=0$ (power-one SPRT), it guarantees type I error control at $\alpha$ that is in general conservative, and thus not optimal. The looseness in both cases is caused by overshoot: the test statistic overshoots the thresholds at the stopping time. One standard way to address this is to calculate the right thresholds numerically, but many papers and software packages do not do this. In this paper, we describe a different way to improve the approximate SPRT: we change the test statistic to avoid overshoot. Our technique uniformly improves power-one SPRTs $(\beta=0)$ for simple nulls and alternatives, or for one-sided nulls and alternatives in exponential families. When $\beta > 0$, our techniques provide valid type I and type II error guarantees, while needing less samples than Wald's approximated thresholds in all considered simulations. These improved sequential tests can also be used for deriving tighter parametric confidence sequences, and can be extended to nontrivial settings like sampling without replacement and conformal martingales.

翻译：Wald (1945) 提出的序贯概率比检验（SPRT）是序贯分析的基石。基于期望的第一类与第二类错误水平 $\alpha, \beta \in (0,1)$，该检验在似然比统计量超过特定上、下阈值时停止，从而保证了期望样本量的最优性。然而，这些阈值并无闭式解，实际应用中常采用近似阈值 $(1-\beta)/\alpha$ 与 $\beta/(1-\alpha)$（近似 SPRT）。当 $\beta > 0$ 时，该方法既无法保证在 $\alpha,\beta$ 水平上控制第一类与第二类错误，也不具有最优性。当 $\beta=0$（功效为一的 SPRT）时，该方法虽能保证将第一类错误控制在 $\alpha$ 水平，但通常较为保守，因而亦非最优。这两种情况下的宽松性均由超调现象引起：检验统计量在停止时刻会超出阈值。解决此问题的标准方法之一是通过数值计算确定正确的阈值，但许多论文与软件包并未实施这一步骤。本文提出一种改进近似 SPRT 的不同思路：通过改变检验统计量以避免超调。对于简单原假设与备择假设，或指数族中单边原假设与备择假设的情形，我们的方法能一致地改进功效为一的 SPRT（$\beta=0$）。当 $\beta > 0$ 时，我们的方法在保证有效控制第一类与第二类错误的同时，在所有模拟实验中所需样本量均少于 Wald 的近似阈值。这些改进的序贯检验亦可用于推导更紧致的参数置信序列，并可扩展至无放回抽样与共形鞅等非平凡场景。