In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
翻译:本研究提出了一种隐式正则化增强框架,旨在加速深度学习中对平坦解的发现,从而提升模型的泛化能力与收敛速度。具体而言,该框架解耦了平坦方向与尖锐方向的动态过程,在平坦方向上增强锐度降低效果,同时在尖锐方向上保持训练稳定性。我们证明,该框架能够与多种基础优化器结合使用,且不会引入显著的计算负担。实验结果表明,在多种基准数据集和模型上,该框架均能持续提升图像分类任务的泛化性能。令人惊讶的是,在Llama模型的预训练过程中,该框架相比AdamW实现了约2倍的加速效果。此外,我们提供了理论保证,证明该框架能够显著加速朝向平坦最小值的收敛过程。