In hypothesis testing problems, taking samples sequentially and stopping opportunistically to make the inference greatly enhances the reliability. The design of the stopping and inference policy, however, critically relies on the knowledge of the underlying distribution of each hypothesis. When the knowledge of distributions, say, $P_0$ and $P_1$ in the binary-hypothesis case, is replaced by empirically observed statistics from the respective distributions, the gain of sequentiality is less understood when subject to universality constraints. In this work, the gap is mended by a unified study on sequentiality in the universal binary classification problem. We propose a unified framework where the universality constraints are set on the expected stopping time as well as the type-I error exponent. The type-I error exponent is required to achieve a pre-set distribution-dependent constraint $\lambda(P_0,P_1)$ for all $P_0,P_1$. The framework is employed to investigate a semi-sequential and a fully-sequential setup, so that fair comparison can be made with the fixed-length setup. The optimal type-II error exponents in different setups are characterized when the function $\lambda$ satisfies mild continuity conditions. The benefit of sequentiality is shown by comparing the semi-sequential, the fully-sequential, and the fixed-length cases in representative examples of $\lambda$. Conditions under which sequentiality eradicates the trade-off between error exponents are also derived.
翻译:在假设检验问题中,序贯采样并择机停止以做出推断可显著提升可靠性。然而,停止与推断策略的设计高度依赖于各假设下潜在分布的已知信息。当分布的先验知识(例如二元假设情形下的$P_0$和$P_1$)被来自各分布的经验观测统计量所替代时,在普适性约束下序贯性的增益尚未得到充分理解。本文通过统一研究通用二元分类问题中的序贯性弥补了这一空白。我们提出统一框架,将普适性约束同时施加于期望停止时间和第一类误差指数。要求第一类误差指数对所有$P_0,P_1$达到预设的分布依赖约束$\lambda(P_0,P_1)$。利用该框架研究半序贯和全序贯两种设置,从而可与固定长度设置进行公平比较。当函数$\lambda$满足温和连续性条件时,刻画了不同设置下的最优第二类误差指数。通过$\lambda$的代表性实例中半序贯、全序贯和固定长度情形的比较,展示了序贯性的优势,并推导出序贯性消除误差指数间权衡的条件。