How many different binary classification problems a single learning algorithm can solve on a fixed data with exactly zero or at most a given number of cross-validation errors? While the number in the former case is known to be limited by the no-free-lunch theorem, we show that the exact answers are given by the theory of error detecting codes. As a case study, we focus on the AUC performance measure and leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out at a time. We show that the maximal number of classification problems with fixed class proportion, for which a learning algorithm can achieve zero LPOCV error, equals the maximal number of code words in a constant weight code (CWC), with certain technical properties. We then generalize CWCs by introducing light CWCs, and prove an analogous result for nonzero LPOCV errors and light CWCs. Moreover, we prove both upper and lower bounds on the maximal numbers of code words in light CWCs. Finally, as an immediate practical application, we develop new LPOCV based randomization tests for learning algorithms that generalize the classical Wilcoxon-Mann-Whitney U test.
翻译:在一个固定数据集上,单个学习算法究竟能解决多少个不同的二分类问题,且其交叉验证误差恰好为零或不超过给定阈值?已知前一种情况下的数量受限于“没有免费午餐定理”,而本文证明,精确答案可由检错编码理论给出。作为案例研究,我们聚焦于AUC性能指标和留对交叉验证(LPOCV)——该方法每次将不同类别标签的所有可能数据对留出。我们证明:在固定类别比例下,学习算法能实现零LPOCV误差的最大分类问题数,等于具有特定技术性质的常重量码(CWC)中的最大码字数。随后,通过引入轻量常重量码(light CWC),我们推广了CWC概念,并证明了非零LPOCV误差与轻量CWC之间的类似结论。此外,我们还推导了轻量CWC中最大码字数的上界和下界。最后,作为直接的实际应用,我们开发了基于LPOCV的新型随机化检验方法,用于学习算法,该方法推广了经典的Wilcoxon-Mann-Whitney U检验。