In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
翻译:本研究提出了一种隐式正则化增强框架,旨在加速深度学习中对平坦解的发现,从而提升模型的泛化能力与收敛速度。具体而言,IRE 框架解耦了平坦方向与尖锐方向的动态过程,在增强平坦方向上锐度降低的同时,保持了尖锐方向上的训练稳定性。我们证明了 IRE 能够与通用的基础优化器结合使用,且不会引入显著的计算负担。实验表明,在多种基准数据集和模型上,IRE 均能持续提升图像分类任务的泛化性能。令人惊讶的是,在 Llama 模型(规模从 60M 到 229M)于 Wikitext-103、Minipile 和 Openwebtext 等数据集上的预训练中,IRE 相比 AdamW 实现了约 2 倍的加速效果。此外,我们提供了理论保证,证明 IRE 能够显著加速 Sharpness-aware Minimization 方法向平坦极小值的收敛过程。