In this paper, we present a simple yet effective provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. The individual-level weight of sampled data is systematically proportional to the exponential of a scaled loss value of the data, where the scaling factor is interpreted as the regularization parameter in the framework of distributionally robust optimization (DRO). Depending on whether the scaling factor is positive or negative, ABSGD is guaranteed to converge to a stationary point of an information-regularized min-max or min-min DRO problem, respectively. Compared with existing class-level weighting schemes, our method can capture the diversity between individual examples within each class. Compared with existing individual-level weighting methods using meta-learning that require three backward propagations for computing mini-batch stochastic gradients, our method is more efficient with only one backward propagation at each iteration as in standard deep learning methods. ABSGD is flexible enough to combine with other robust losses without any additional cost. Our empirical studies on several benchmark datasets demonstrate the effectiveness of the proposed method.\footnote{Code is available at:\url{https://github.com/qiqi-helloworld/ABSGD/}}
翻译:本文提出一种简单而有效的可证明方法(名为ABSGD),用于解决深度学习中的数据不平衡或标签噪声问题。该方法是对动量随机梯度下降的简单改进,我们为小批量中的每个样本分配独立的个体重要性权重。采样数据的个体级权重与数据缩放损失值的指数成正比,其中缩放因子被解释为分布鲁棒优化框架中的正则化参数。根据缩放因子为正或负,ABSGD分别保证收敛到信息正则化极小极大或极小极小分布鲁棒优化问题的稳定点。与现有的类别级加权方案相比,我们的方法能够捕捉每个类别内个体样本之间的多样性。与现有使用元学习(每次迭代需三次反向传播计算小批量随机梯度)的个体级加权方法相比,我们的方法在每次迭代中仅需一次反向传播(与标准深度学习方法相同),效率更高。ABSGD足够灵活,可与其他鲁棒损失函数结合使用且无需额外成本。我们在多个基准数据集上的实证研究证明了该方法的有效性。\footnote{代码地址:\url{https://github.com/qiqi-helloworld/ABSGD/}}