A Unified Study on Sequentiality in Universal Classification with Empirically Observed Statistics

In the binary hypothesis testing problem, it is well known that sequentiality in taking samples eradicates the trade-off between two error exponents, yet implementing the optimal test requires the knowledge of the underlying distributions, say $P_0$ and $P_1$. In the scenario where the knowledge of distributions is replaced by empirically observed statistics from the respective distributions, the gain of sequentiality is less understood when subject to universality constraints over all possible $P_0,P_1$. In this work, the gap is mended by a unified study on sequentiality in the universal binary classification problem, where the universality constraints are set on the expected stopping time as well as the type-I error exponent. The type-I error exponent is required to achieve a pre-set distribution-dependent constraint $\lambda(P_0,P_1)$ for all $P_0,P_1$. Under the proposed framework, different sequential setups are investigated so that fair comparisons can be made with the fixed-length counterpart. By viewing these sequential classification problems as special cases of a general sequential composite hypothesis testing problem, the optimal type-II error exponents are characterized. Specifically, in the general sequential composite hypothesis testing problem subject to universality constraints, upper and lower bounds on the type-II error exponent are proved, and a sufficient condition for which the bounds coincide is given. The results for sequential classification problems are then obtained accordingly. With the characterization of the optimal error exponents, the benefit of sequentiality is shown both analytically and numerically by comparing the sequential and the fixed-length cases in representative examples of type-I exponent constraint $\lambda$.

翻译：在二元假设检验问题中，众所周知，采用序贯采样能够消除两类错误指数之间的权衡，但实现最优检验需要已知基础分布（记为$P_0$和$P_1$）。在分布知识被来自各自分布的经验观测统计量所替代的场景下，当对全部可能的$P_0,P_1$施加通用性约束时，序贯性带来的增益尚未得到充分理解。本文通过对通用二元分类问题中序贯性的统一研究弥补了这一空白，其中通用性约束同时施加于期望停止时间与第一类错误指数。要求第一类错误指数对所有$P_0,P_1$达到预设的分布相关约束$\lambda(P_0,P_1)$。在所提出的框架下，研究了不同的序贯设置，以便与固定长度方案进行公平比较。通过将这些序贯分类问题视为一般序贯复合假设检验问题的特例，刻画了最优的第二类错误指数。具体而言，在受通用性约束的一般序贯复合假设检验问题中，证明了第二类错误指数的上下界，并给出了两者重合的充分条件。序贯分类问题的结果据此导出。通过对第一类错误指数约束$\lambda$的代表性示例中序贯与固定长度情况的比较，从解析和数值两方面展示了序贯性的优势。