Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show significant improvements, including a decrease in the overall training time by up to 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.
翻译:自适应梯度方法因其快速收敛和对超参数敏感性降低的特点,日益受到深度学习界的青睐。然而,这些方法也存在局限性,例如对移动平均等元素增加了内存需求,以及其收敛理论尚不完善。为克服这些挑战,我们提出了F-CMA,一种快速受控小批量算法。该算法采用随机重排方法,并引入了充分下降条件和线搜索过程,以确保每个训练周期损失函数的降低,同时提供了其收敛到平稳点的确定性全局收敛证明。为评估F-CMA,我们将其集成到用于分类任务的常规训练流程中,涉及卷积神经网络和视觉Transformer模型,以便与主流优化器进行直接比较。计算测试结果表明,该方法带来了显著改进:总体训练时间最多减少68%,每周期效率最高提升20%,模型准确率最高提高5%。