We revisit $M$-ary classification of Gutman (TIT 1989), where one is tasked to determine whether a testing sequence is generated with the same distribution as one of the $M$ training sequences or not. Our main result is a two-phase test, its theoretical analysis and its optimality guarantee. Specifically, our two-phase test is a special case of a sequential test with only two decision time points: the first phase of our test is a fixed-length test with a reject option, the second-phase of our test proceeds only if a reject option is decided in the first phase and the second phase of our test does \emph{not} allow a reject option. To provide theoretical guarantee for our test, we derive achievable error exponents using the method of types and derive a converse result for the optimal sequential test using the techniques recently proposed by Hsu, Li and Wang (ITW, 2022) for binary classification. Analytically and numerically, we show that our two phase test achieves the performance of an optimal sequential test with proper choice of test parameters. In particular, similarly as the optimal sequential test, our test does not need a final reject option to achieve the optimal error exponent region while an optimal fixed-length test needs a reject option to achieve the same region. Finally, we specialize our results to binary classification when $M=2$ and to $M$-ary hypothesis testing when the ratio of the lengths of training sequences and testing sequences tends to infinity so that generating distributions can be estimated perfectly.
翻译:我们重新审视Gutman(TIT 1989)提出的M元分类问题,其中任务在于判断一个测试序列是否与M个训练序列中的某一个具有相同的分布。我们的主要成果是提出了一种两阶段测试方法,并对其进行了理论分析和最优性保证。具体而言,我们的两阶段测试是序贯测试的一种特例,仅包含两个决策时间点:测试的第一阶段是带有拒绝选项的固定长度测试,只有当第一阶段决定采用拒绝选项时,才进入第二阶段,且第二阶段测试不允许有任何拒绝选项。为了为我们的测试提供理论保障,我们利用类型方法推导了可达错误指数,并采用Hsu、Li和Wang(ITW,2022)最近针对二分类提出的技术,推导了最优序贯测试的逆结果。在解析和数值层面,我们证明,通过适当选择测试参数,我们的两阶段测试能够达到最优序贯测试的性能。特别地,与最优序贯测试类似,我们的测试无需最终拒绝选项即可实现最优错误指数区域,而最优固定长度测试则需借助拒绝选项才能达到相同区域。最后,我们将结果特化到M=2时的二分类情况,以及当训练序列与测试序列长度之比趋于无穷大、以致生成分布可被完美估计时的M元假设检验情况。