How many different binary classification problems a single learning algorithm can solve on a fixed data with exactly zero or at most a given number of cross-validation errors? While the number in the former case is known to be limited by the no-free-lunch theorem, we show that the exact answers are given by the theory of error detecting codes. As a case study, we focus on the AUC performance measure and leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out at a time. We shown that the maximal number of classification problems with fixed class proportion, for which a learning algorithm can achieve zero LPOCV error, equals the maximal number of code words in a constant weight code (CWC), with certain technical properties. We then generalize CWCs by introducing light CWCs and prove an analogous result for nonzero LPOCV errors and light CWCs. Moreover, we prove both upper and lower bounds on the maximal numbers of code words in light CWCs. Finally, as an immediate practical application, we develop new LPOCV based randomization tests for learning algorithms that generalize the classical Wilcoxon-Mann-Whitney U test.
翻译:一个学习算法在固定数据集上,究竟能解决多少个不同的二分类问题,且恰好实现零交叉验证误差或不超过给定数量的交叉验证误差?虽然对于前一种情况,无免费午餐定理限制了问题的数量,但我们证明,确切的答案由错误检测编码理论给出。作为案例研究,我们聚焦于AUC性能指标和留对交叉验证(LPOCV),其中每次保留所有不同类别标签的数据对。我们证明,在固定类别比例下,学习算法能实现零LPOCV误差的最大分类问题数量,等于具有特定技术性质的常重码(CWC)中最大码字数量。进而,我们通过引入轻量常重码(light CWC)推广了CWC,并证明了关于非零LPOCV误差与轻量CWC的类似结果。此外,我们证明了轻量CWC中最大码字数量的上界和下界。最后,作为直接的实际应用,我们基于LPOCV开发了新的学习算法随机化检验方法,推广了经典的Wilcoxon-Mann-Whitney U检验。