Class imbalance poses a significant challenge in classification tasks, where traditional approaches often lead to biased models and unreliable predictions. Undersampling and oversampling techniques have been commonly employed to address this issue, yet they suffer from inherent limitations stemming from their simplistic approach such as loss of information and additional biases respectively. In this paper, we propose a novel framework that leverages learning theory and concentration inequalities to overcome the shortcomings of traditional solutions. We focus on understanding the uncertainty in a class-dependent manner, as captured by confidence bounds that we directly embed into the learning process. By incorporating class-dependent estimates, our method can effectively adapt to the varying degrees of imbalance across different classes, resulting in more robust and reliable classification outcomes. We empirically show how our framework provides a promising direction for handling imbalanced data in classification tasks, offering practitioners a valuable tool for building more accurate and trustworthy models.
翻译:类别不平衡在分类任务中构成了一个重大挑战,传统方法往往导致模型存在偏差且预测结果不可靠。欠采样和过采样技术常被用于解决此问题,但它们各自因方法过于简化而存在固有局限,例如分别导致信息丢失和引入额外偏差。本文提出了一种新颖的框架,该框架利用学习理论和集中不等式来克服传统解决方案的缺陷。我们重点关注以类别依赖的方式理解不确定性,这通过我们直接嵌入学习过程的置信度边界来体现。通过纳入类别依赖的估计,我们的方法能够有效适应不同类别间不同程度的不平衡,从而产生更稳健可靠的分类结果。我们通过实验证明,该框架为处理分类任务中的不平衡数据提供了一个有前景的方向,为实践者构建更准确、更可信的模型提供了有价值的工具。