Consider a collection of m competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to ``win'' (rank highest) on a future, unseen dataset. The standard maximum likelihood approach suggests counting the number of wins per each algorithm. In this work, we argue that there is much more information in the complete rankings. That is, the number of times that each algorithm finished second, third and so forth. Yet, it is not entirely clear how to effectively utilize this information for our purpose. In this work we introduce a novel conceptual framework for estimating the win probability for each of the m algorithms, given their complete rankings over a benchmark of datasets. Our proposed framework significantly improves upon currently known methods in synthetic and real-world examples.
翻译:考虑一组m个相互竞争的机器学习算法。给定它们在基准数据集上的性能表现,我们希望识别出性能最优的算法。具体而言,即确定哪种算法最有可能在未来的未知数据集上"胜出"(获得最高排名)。传统的极大似然方法建议统计每个算法的获胜次数。本文认为,完整的排序结果中蕴含着更丰富的信息——即每个算法获得第二名、第三名等排名的频次分布。然而,如何有效利用这些信息实现我们的目标尚不完全明确。本研究提出一个新颖的概念框架,用于根据m个算法在基准数据集上的完整排序结果,估计每个算法的胜出概率。在合成数据与真实场景的实验中,我们提出的框架较现有方法展现出显著优势。