This work invokes the notion of $f$-divergence to introduce a novel upper bound on the Bayes error rate of a general classification task. We show that the proposed bound can be computed by sampling from the output of a parameterized model. Using this practical interpretation, we introduce the Bayes optimal learning threshold (BOLT) loss whose minimization enforces a classification model to achieve the Bayes error rate. We validate the proposed loss for image and text classification tasks, considering MNIST, Fashion-MNIST, CIFAR-10, and IMDb datasets. Numerical experiments demonstrate that models trained with BOLT achieve performance on par with or exceeding that of cross-entropy, particularly on challenging datasets. This highlights the potential of BOLT in improving generalization.
翻译:本研究引入$f$-散度的概念,提出了一种针对一般分类任务贝叶斯错误率的新上界。我们证明该上界可通过对参数化模型输出进行采样来计算。基于这一实用化解释,我们提出了贝叶斯最优学习阈值损失函数,其最小化可迫使分类模型达到贝叶斯错误率。我们在图像与文本分类任务中验证了所提损失函数的有效性,实验涵盖MNIST、Fashion-MNIST、CIFAR-10及IMDb数据集。数值实验表明,使用BOLT训练的模型在性能上达到或超越了交叉熵损失,尤其在具有挑战性的数据集上表现突出。这揭示了BOLT在提升模型泛化能力方面的潜力。