In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence. Specifically, IRE decouples the dynamics of flat and sharp directions, which boosts the sharpness reduction along flat directions while maintaining the training stability in sharp directions. We show that IRE can be practically incorporated with {\em generic base optimizers} without introducing significant computational overload. Experiments show that IRE consistently improves the generalization performance for image classification tasks across a variety of benchmark datasets (CIFAR-10/100, ImageNet) and models (ResNets and ViTs). Surprisingly, IRE also achieves a $2\times$ {\em speed-up} compared to AdamW in the pre-training of Llama models (of sizes ranging from 60M to 229M) on datasets including Wikitext-103, Minipile, and Openwebtext. Moreover, we provide theoretical guarantees, showing that IRE can substantially accelerate the convergence towards flat minima in Sharpness-aware Minimization (SAM).
翻译:本文提出了一种隐式正则化增强框架,旨在加速深度学习中对平坦解的发现过程,从而提升模型的泛化能力与收敛效率。具体而言,该框架将平坦方向与尖锐方向的动力学特性解耦,在保持尖锐方向训练稳定性的同时,显著增强平坦方向上的锐度降低效果。研究表明,该框架能够与多种基础优化器灵活结合,且不会引入显著的计算负担。实验结果表明,在多种基准数据集与模型架构上,该框架均能持续提升图像分类任务的泛化性能。值得注意的是,在Llama系列模型的预训练过程中,该框架相比AdamW实现了约2倍的加速效果。此外,本文从理论上证明了该框架能够显著加速基于锐度感知最小化方法向平坦极小值的收敛过程。