ART: Adaptive Resampling-based Training for Imbalanced Classification

Traditional resampling methods for handling class imbalance typically uses fixed distributions, undersampling the majority or oversampling the minority. These static strategies ignore changes in class-wise learning difficulty, which can limit the overall performance of the model. This paper proposes an Adaptive Resampling-based Training (ART) method that periodically updates the distribution of the training data based on the class-wise performance of the model. Specifically, ART uses class-wise macro F1 scores, computed at fixed intervals, to determine the degree of resampling to be performed. Unlike instance-level difficulty modeling, which is noisy and outlier-sensitive, ART adapts at the class level. This allows the model to incrementally shift its attention towards underperforming classes in a way that better aligns with the optimization objective. Results on diverse benchmarks, including Pima Indians Diabetes and Yeast dataset demonstrate that ART consistently outperforms both resampling-based and algorithm-level methods, including Synthetic Minority Oversampling Technique (SMOTE), NearMiss Undersampling, and Cost-sensitive Learning on binary as well as multi-class classification tasks with varying degrees of imbalance. In most settings, these improvements are statistically significant. On tabular datasets, gains are significant under paired t-tests and Wilcoxon tests (p < 0.05), while results on text and image tasks remain favorable. Compared to training on the original imbalanced data, ART improves macro F1 by an average of 2.64 percentage points across all tested tabular datasets. Unlike existing methods, whose performance varies by task, ART consistently delivers the strongest macro F1, making it a reliable choice for imbalanced classification.

翻译：传统处理类别不平衡的重采样方法通常采用固定分布，即对多数类进行欠采样或对少数类进行过采样。这些静态策略忽略了各类别学习难度的动态变化，从而可能限制模型的整体性能。本文提出一种基于自适应重采样的训练方法，该方法根据模型在各类别上的表现定期更新训练数据分布。具体而言，ART利用固定间隔计算的类别宏观F1分数来确定重采样的程度。与存在噪声且对异常值敏感的实例级难度建模不同，ART在类别层面进行自适应调整。这使得模型能够以更符合优化目标的方式，逐步将注意力转向表现欠佳的类别。在包括皮马印第安人糖尿病数据集和酵母数据集在内的多种基准测试上的结果表明，ART在二分类及多分类任务中，均持续优于基于重采样的方法及算法级方法，如合成少数类过采样技术、近邻欠采样和代价敏感学习，且在不同不平衡程度下均保持优势。在大多数实验设置中，这些改进具有统计显著性。在表格数据集上，配对t检验与Wilcoxon检验均显示显著提升；在文本与图像任务中，结果同样表现优异。与直接在原始不平衡数据上训练相比，ART在所有测试的表格数据集上将宏观F1平均提升了2.64个百分点。与现有方法性能因任务而异的情况不同，ART始终提供最强的宏观F1表现，使其成为不平衡分类任务中可靠的选择。