Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based optimizer. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient. With SGD, class imbalance has an additional effect on the direction of the gradients: the minority class suffers from a higher directional noise, which reduces the effectiveness of the per-class gradient normalization. Our findings not only allow us to understand the potential and limitations of strategies involving the per-class gradients, but also the reason for the effectiveness of previously used solutions for class imbalance such as oversampling.
翻译:数据不平衡是机器学习中的常见问题,会对模型性能产生关键影响。现有多种解决方案,但其对学习动力学收敛性的影响尚不明确。本文阐明了数据不平衡对学习的显著负面影响,表明当使用基于梯度的优化器训练时,少数类和多数类的学习曲线会遵循次优轨迹。这种减缓与不平衡比率相关,并可追溯至不同类别优化之间的竞争。我们的主要贡献在于分析了全批量梯度下降(GD)和随机梯度下降(SGD)及其对每类梯度贡献进行重新归一化变体的收敛性。研究发现,GD无法保证降低每个类别的损失,但这一问题可通过执行每个类别的梯度归一化来解决。对于SGD,类别不平衡还对梯度方向产生附加影响:少数类面临更高的方向性噪声,这降低了每类梯度归一化的有效性。我们的发现不仅揭示了基于每类梯度策略的潜力与局限性,还解释了先前用于类别不平衡的解决方案(如过采样)的有效性原因。