Sequential tests and their implied confidence sequences, which are valid at arbitrary stopping times, promise flexible statistical inference and on-the-fly decision making. However, strong guarantees are limited to parametric sequential tests that under-cover in practice or concentration-bound-based sequences that over-cover and have suboptimal rejection times. In this work, we consider classic delayed-start normal-mixture sequential probability ratio tests, and we provide the first asymptotic type-I-error and expected-rejection-time guarantees under general non-parametric data generating processes, where the asymptotics are indexed by the test's burn-in time. The type-I-error results primarily leverage a martingale strong invariance principle and establish that these tests (and their implied confidence sequences) have type-I error rates asymptotically equivalent to the desired (possibly varying) $\alpha$-level. The expected-rejection-time results primarily leverage an identity inspired by It\^o's lemma and imply that, in certain asymptotic regimes, the expected rejection time is asymptotically equivalent to the minimum possible among $\alpha$-level tests. We show how to apply our results to sequential inference on parameters defined by estimating equations, such as average treatment effects. Together, our results establish these (ostensibly parametric) tests as general-purpose, non-parametric, and near-optimal. We illustrate this via numerical simulations and a real-data application to A/B testing at Netflix.
翻译:摘要:序贯检验及其隐含的置信序列(可在任意停止时间成立)有望实现灵活的统计推断与即时决策。然而,强有力的保证局限于实践中覆盖不足的参数化序贯检验,或基于置信界(过度覆盖且拒绝时间次优)的序列。本研究考虑经典的延迟启动正态混合序贯概率比检验,首次在一般非参数数据生成过程下提供渐近第一类错误率与预期拒绝时间保证,其中渐近性以检验的预热时间为索引。第一类错误结果主要利用鞅强不变原理,证明这些检验(及其隐含的置信序列)的第一类错误率渐近等价于预设(可能变化的)显著性水平$\alpha$。预期拒绝时间结果主要基于受伊藤引理启发的恒等式,表明在特定渐近框架下,预期拒绝时间渐近等价于$\alpha$水平检验的最小可能值。我们展示了如何将结果应用于由估计方程定义的参数(如平均处理效应)的序贯推断。综上,本研究确立这些(表面参数化的)检验具有通用性、非参数性及近最优性。通过数值模拟及Netflix A/B测试的真实数据应用验证了上述结论。