We introduce a powerful deep classifier two-sample test for high-dimensional data based on E-values, called E-value Classifier Two-Sample Test (E-C2ST). Our test combines ideas from existing work on split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime-valid sequential two-sample tests. This feature allows for more effective use of data in constructing test statistics. Through simulations and real data applications, we empirically demonstrate that E-C2ST achieves enhanced statistical power by partitioning datasets into multiple batches beyond the conventional two-split (training and testing) approach of standard classifier two-sample tests. This strategy increases the power of the test while keeping the type I error well below the desired significance level.
翻译:我们提出了一种基于E值的高维数据深度分类器双样本检验方法,称为E值分类器双样本检验(E-C2ST)。该检验融合了现有分裂似然比检验和预测独立性检验的思想。由此产生的E值适用于任意有效序贯双样本检验,这一特性使得在构建检验统计量时能更有效地利用数据。通过模拟实验和真实数据应用,我们实证表明:与标准分类器双样本检验中传统的两分法(训练与测试)相比,E-C2ST通过将数据集划分为多个批次,能够获得显著增强的统计功效。该策略在保持第一类错误率远低于期望显著性水平的同时,有效提升了检验效能。