In multiple classification, one aims to determine whether a testing sequence is generated from the same distribution as one of the M training sequences or not. Unlike most of existing studies that focus on discrete-valued sequences with perfect distribution match, we study multiple classification for continuous sequences with distribution uncertainty, where the generating distributions of the testing and training sequences deviate even under the true hypothesis. In particular, we propose distribution free tests and prove that the error probabilities of our tests decay exponentially fast for three different test designs: fixed-length, sequential, and two-phase tests. We first consider the simple case without the null hypothesis, where the testing sequence is known to be generated from a distribution close to the generating distribution of one of the training sequences. Subsequently, we generalize our results to a more general case with the null hypothesis by allowing the testing sequence to be generated from a distribution that is vastly different from the generating distributions of all training sequences.
翻译:在多分类任务中,目标是判断一个测试序列是否与M个训练序列中的某一个来自相同的分布。与现有大多数研究聚焦于具有完美分布匹配的离散值序列不同,我们研究了具有分布不确定性的连续序列的多分类问题,其中即使在真实假设下,测试序列与训练序列的生成分布也存在偏差。具体而言,我们提出了分布无关的检验方法,并证明了对于三种不同的检验设计——固定长度检验、序贯检验和两阶段检验,我们检验方法的错误概率均以指数速度衰减。我们首先考虑不存在零假设的简单情况,其中已知测试序列的生成分布与某个训练序列的生成分布接近。随后,我们将结果推广到包含零假设的更一般情况,允许测试序列的生成分布与所有训练序列的生成分布存在显著差异。