We consider the problem of constructing sequential power-one tests where the null and alternative classes are specified indirectly through historical or offline data. More specifically, given an offline dataset consisting of observations from $L+1$ distributions $\{P_0, P_1, \ldots, P_L\}$, and a new unlabeled data stream $\{X_t: t \geq 1\} \overset{i.i.d}{\sim} P_θ$, the goal is to decide between the null $H_0: θ= 0$, against the alternative $H_1: θ\in [L]:=\{1,\ldots,L\}$. Our main methodological contribution is a general approach for designing a level-$α$ power-one test for this problem using a multi-class classifier trained on the given offline dataset. Working under a mild "separability" condition on the distributions and the trained classifier, we obtain an upper bound on the expected stopping time of our proposed level-$α$ test, and then show that in general this cannot be improved. In addition to rejecting the null, we show that our procedure can also identify the true underlying distribution almost surely. We then establish a sufficient condition to ensure the required separability of the classifier, and provide some converse results to investigate the role of the size of the offline dataset and the family of classifiers among classifier-based tests that satisfy the level-$α$ power-one criterion. Finally, we present an extension of our analysis for the training-and-testing distribution mismatch and illustrate an application to sequential change detection. Empirical results using both synthetic and real data provide support for our theoretical results.
翻译:本文考虑构造序贯幂1检验的问题,其中零假设和备择类通过历史数据或离线数据间接指定。具体而言,给定由$L+1$个分布$\{P_0, P_1, \ldots, P_L\}$的观测值组成的离线数据集,以及一个未标注的新数据流$\{X_t: t \geq 1\} \overset{i.i.d}{\sim} P_θ$,目标是判定零假设$H_0: θ= 0$与备择假设$H_1: θ\in [L]:=\{1,\ldots,L\}$。我们的主要方法学贡献是提出一种通用方法,利用在给定离线数据集上训练的多分类器设计该问题的水平-α幂1检验。在分布与训练分类器满足温和的“可分离性”条件时,我们得到了所提水平-α检验期望停止时间的上界,并证明了该上界在一般情况下不可改进。除拒绝零假设外,我们还证明本方法能以概率1识别真实分布。随后,我们建立了确保分类器所需可分离性的充分条件,并给出部分逆结果以探究离线数据集规模与分类器族在满足水平-α幂1准则的基于分类器检验中的作用。最后,我们扩展分析了训练-测试分布失配情况,并展示了在序贯变化检测中的应用。基于合成数据与实际数据的实证结果支持了我们的理论结论。