The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss, examples with higher likelihoods (i.e., well-classified examples) contribute smaller gradients in back-propagation. However, we theoretically show that this common practice hinders representation learning, energy optimization, and margin growth. To counteract this deficiency, we propose to reward well-classified examples with additive bonuses to revive their contribution to the learning process. This counterexample theoretically addresses these three issues. We empirically support this claim by directly verifying the theoretical results or significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. Furthermore, this paper shows that we can deal with complex scenarios, such as imbalanced classification, OOD detection, and applications under adversarial attacks because our idea can solve these three issues. Code is available at: https://github.com/lancopku/well-classified-examples-are-underestimated.
翻译:学习深度分类模型的传统观念是关注难以分类的样本,而忽略远离决策边界的易分类样本。例如在使用交叉熵损失训练时,似然度较高的样本(即易分类样本)在反向传播中贡献更小的梯度。然而我们在理论上证明这种常见做法会阻碍表示学习、能量优化和边界增长。为弥补这一缺陷,我们提出通过附加奖励来激励易分类样本,恢复其对学习过程的贡献。该反例方法从理论上解决了上述三个问题。我们通过在图像分类、图分类和机器翻译等多样化任务中直接验证理论结果或显著性能提升来实证支持这一主张。进一步研究表明,由于我们的方法能解决这三个问题,因此可以处理不平衡分类、OOD检测及对抗攻击应用等复杂场景。代码开源地址:https://github.com/lancopku/well-classified-examples-are-underestimated