In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
翻译:本文提出了一种隐式正则化增强框架,旨在加速深度学习中对平坦解的发现,从而提升模型的泛化能力与收敛速度。具体而言,该框架解耦了平坦方向与尖锐方向的动态过程,在平坦方向上加强锐度降低,同时在尖锐方向上保持训练稳定性。研究表明,该框架能够与多种基础优化器结合使用,而不会引入显著的计算负担。实验表明,在多种基准数据集与模型上,该框架均能持续提升图像分类任务的泛化性能。值得注意的是,在Llama系列模型的预训练中,该框架相比AdamW实现了约2倍的加速效果。此外,本文从理论上证明了该框架能够显著加速基于锐度感知最小化的平坦极小值收敛过程。