Evaluating Bayes Error Estimators on Real-World Datasets with FeeBee

The Bayes error rate (BER) is a fundamental concept in machine learning that quantifies the best possible accuracy any classifier can achieve on a fixed probability distribution. Despite years of research on building estimators of lower and upper bounds for the BER, these were usually compared only on synthetic datasets with known probability distributions, leaving two key questions unanswered: (1) How well do they perform on real-world datasets?, and (2) How practical are they? Answering these is not trivial. Apart from the obvious challenge of an unknown BER for real-world datasets, there are two main aspects any BER estimator needs to overcome in order to be applicable in real-world settings: (1) the computational and sample complexity, and (2) the sensitivity and selection of hyper-parameters. In this work, we propose FeeBee, the first principled framework for analyzing and comparing BER estimators on any modern real-world dataset with unknown probability distribution. We achieve this by injecting a controlled amount of label noise and performing multiple evaluations on a series of different noise levels, supported by a theoretical result which allows drawing conclusions about the evolution of the BER. By implementing and analyzing 7 multi-class BER estimators on 6 commonly used datasets of the computer vision and NLP domains, FeeBee allows a thorough study of these estimators, clearly identifying strengths and weaknesses of each, whilst being easily deployable on any future BER estimator.

翻译：Bayes 误差率( BER) 是机器学习的一个基本概念, 它量化了任何分类者在固定概率分布上所能达到的最佳可能的精确度。尽管多年来对BER建立下界和上界的测算器进行了多年研究, 但这些研究通常只对已知概率分布的合成数据集进行了比较, 留下两个未解的关键问题:(1) 它们如何在真实世界的数据集中表现? 和(2) 它们如何实用? 回答这些并非微不足道。除了一个未知的BER对于真实世界数据集来说是一个显而易见的挑战之外, 任何BER的测量器需要克服两个主要方面, 才能适用于现实世界环境中:(1) 计算和抽样的复杂性,(2) 超参数的敏感度和选择。在这项工作中,我们提出了Feebee, 用于分析和比较任何现代真实世界数据集的测算器的第一个原则框架, 其概率分布不明。我们通过对真实世界数据集进行可控量量的标签噪音和对一系列不同噪音水平进行多重评价, 并辅之以理论性结果, 使得能够对每个模型进行精确地分析, 并且能够对每个模型进行这些模型进行分析。