We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample importance weighting. Our method is grounded in the principles of distributionally robust optimization (DRO) with Kullback-Leibler divergence. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the broad applicability and impact of RGD by achieving state-of-the-art results on diverse benchmarks, including improvements of +0.7% (DomainBed), +1.44% (tabular classification), +1.94% (GLUE with BERT), and +1.01% (ImageNet-1K with ViT).
翻译:我们提出了一种新颖的优化技术——重加权梯度下降法(RGD),该方法通过动态样本重要性加权提升了深度神经网络的性能。该方法基于Kullback-Leibler散度下的分布鲁棒优化(DRO)原理。RGD实现简单、计算效率高,并与SGD和Adam等广泛使用的优化器兼容。通过在多个基准测试中取得最先进的结果,我们展示了RGD的广泛适用性和显著影响,包括在DomainBed上提升+0.7%,在表格分类上提升+1.44%,在基于BERT的GLUE任务上提升+1.94%,以及在基于ViT的ImageNet-1K上提升+1.01%。