We develop a re-weighted gradient descent technique for boosting the performance of deep neural networks. Our algorithm involves the importance weighting of data points during each optimization step. Our approach is inspired by distributionally robust optimization with $f$-divergences, which has been known to result in models with improved generalization guarantees. Our re-weighting scheme is simple, computationally efficient, and can be combined with any popular optimization algorithms such as SGD and Adam. Empirically, we demonstrate our approach's superiority on various tasks, including vanilla classification, classification with label imbalance, noisy labels, domain adaptation, and tabular representation learning. Notably, we obtain improvements of +0.7% and +1.44% over SOTA on DomainBed and Tabular benchmarks, respectively. Moreover, our algorithm boosts the performance of BERT on GLUE benchmarks by +1.94%, and ViT on ImageNet-1K by +0.9%. These results demonstrate the effectiveness of the proposed approach, indicating its potential for improving performance in diverse domains.
翻译:我们提出了一种重加权梯度下降技术,用于提升深度神经网络的性能。该算法在每次优化步骤中对数据点进行重要性加权。我们的方法受基于$f$-散度的分布鲁棒优化启发,已知该方法能带来具有更好泛化保证的模型。我们的重加权方案简单、计算高效,并且可与任何主流优化算法(如SGD和Adam)结合使用。通过实验,我们展示了该方法在多种任务上的优越性,包括标准分类、标签不平衡分类、噪声标签、域自适应以及表格表征学习。值得注意的是,我们在DomainBed和Tabular基准上分别比当前最优方法提升了+0.7%和+1.44%。此外,我们的算法在GLUE基准上将BERT的性能提升了+1.94%,在ImageNet-1K上将ViT的性能提升了+0.9%。这些结果证明了所提方法的有效性,表明其在提升不同领域性能方面的潜力。