Class imbalance poses a significant challenge in classification tasks, where traditional approaches often lead to biased models and unreliable predictions. Undersampling and oversampling techniques have been commonly employed to address this issue, yet they suffer from inherent limitations stemming from their simplistic approach such as loss of information and additional biases respectively. In this paper, we propose a novel framework that leverages learning theory and concentration inequalities to overcome the shortcomings of traditional solutions. We focus on understanding the uncertainty in a class-dependent manner, as captured by confidence bounds that we directly embed into the learning process. By incorporating class-dependent estimates, our method can effectively adapt to the varying degrees of imbalance across different classes, resulting in more robust and reliable classification outcomes. We empirically show how our framework provides a promising direction for handling imbalanced data in classification tasks, offering practitioners a valuable tool for building more accurate and trustworthy models.
翻译:类别不平衡对分类任务构成重大挑战,传统方法往往导致模型偏差和不可靠的预测。欠采样与过采样技术虽常用于解决此问题,但因其方法本身的局限性——前者导致信息损失,后者引入额外偏差——而存在固有缺陷。本文提出一种新颖框架,利用学习理论与集中不等式克服传统解决方案的不足。我们重点关注以类别依赖方式理解不确定性,通过将置信边界直接嵌入学习过程来实现这一目标。通过纳入类别依赖的估计,我们的方法能有效适应不同类别间不同程度的不平衡,从而产生更稳健可靠的分类结果。我们通过实验证明,该框架为处理分类任务中的不平衡数据提供了有前景的方向,为实践者构建更准确可信的模型提供了有价值的工具。