We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample importance weighting. Our method is grounded in the principles of distributionally robust optimization (DRO) with Kullback-Leibler divergence. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the broad applicability and impact of RGD by achieving state-of-the-art results on diverse benchmarks, including improvements of +0.7% (DomainBed), +1.44% (tabular classification), +1.94% (GLUE with BERT), and +1.01% (ImageNet-1K with ViT).
翻译:我们提出重加权梯度下降(RGD)——一种新颖的优化技术,通过动态样本重要性加权提升深度神经网络性能。该方法基于分布鲁棒优化(DRO)原理,采用Kullback-Leibler散度。RGD实现简单、计算高效,且与SGD和Adam等广泛使用的优化器兼容。我们在多个基准测试上取得了最先进的结果,展现了RGD的广泛适用性与影响力,包括DomainBed(+0.7%)、表格分类(+1.44%)、基于BERT的GLUE(+1.94%)以及基于ViT的ImageNet-1K(+1.01%)的提升。