We develop a re-weighted gradient descent technique for boosting the performance of deep neural networks, which involves importance weighting of data points during each optimization step. Our approach is inspired by distributionally robust optimization with f-divergences, which has been known to result in models with improved generalization guarantees. Our re-weighting scheme is simple, computationally efficient, and can be combined with many popular optimization algorithms such as SGD and Adam. Empirically, we demonstrate the superiority of our approach on various tasks, including supervised learning, domain adaptation. Notably, we obtain improvements of +0.7% and +1.44% over SOTA on DomainBed and Tabular classification benchmarks, respectively. Moreover, our algorithm boosts the performance of BERT on GLUE benchmarks by +1.94%, and ViT on ImageNet-1K by +1.01%. These results demonstrate the effectiveness of the proposed approach, indicating its potential for improving performance in diverse domains.
翻译:我们提出了一种重加权梯度下降技术,用于提升深度神经网络的性能,该技术在每个优化步骤中对数据点进行重要性加权。该方法受基于f-散度的分布鲁棒优化启发,已知这类优化可产生具有更优泛化保证的模型。我们的重加权方案简单、计算高效,并可结合SGD和Adam等众多主流优化算法。在包括监督学习和领域自适应在内的多种任务上,我们通过实验证明了该方法的优越性。值得注意的是,我们在DomainBed和表格分类基准测试上分别比当前最优方法提升了+0.7%和+1.44%。此外,我们的算法使BERT在GLUE基准测试上的性能提升了+1.94%,使ViT在ImageNet-1K上的性能提升了+1.01%。这些结果充分证明了所提方法的有效性,表明其在提升不同领域性能方面具有潜力。