The stochastic gradient descent (SGD) algorithm has achieved remarkable success in training deep learning models. However, it has several limitations, including susceptibility to vanishing gradients, sensitivity to input data, and a lack of robust theoretical guarantees. In recent years, alternating minimization (AM) methods have emerged as a promising alternative for model training by employing gradient-free approaches to iteratively update model parameters. Despite their potential, these methods often exhibit slow convergence rates. To address this challenge, we propose a novel Triple-Inertial Accelerated Alternating Minimization (TIAM) framework for neural network training. The TIAM approach incorporates a triple-inertial acceleration strategy with a specialized approximation method, facilitating targeted acceleration of different terms in each sub-problem optimization. This integration improves the efficiency of convergence, achieving superior performance with fewer iterations. Additionally, we provide a convergence analysis of the TIAM algorithm, including its global convergence properties and convergence rate. Extensive experiments validate the effectiveness of the TIAM method, showing significant improvements in generalization capability and computational efficiency compared to existing approaches, particularly when applied to the rectified linear unit (ReLU) and its variants.
翻译:随机梯度下降(SGD)算法在训练深度学习模型方面取得了显著成功。然而,它存在若干局限性,包括易受梯度消失影响、对输入数据敏感以及缺乏坚实的理论保证。近年来,交替最小化(AM)方法作为一种有前景的模型训练替代方案出现,其采用无梯度方法迭代更新模型参数。尽管具有潜力,这些方法通常表现出较慢的收敛速度。为应对这一挑战,我们提出了一种用于神经网络训练的新型三重惯性加速交替最小化(TIAM)框架。TIAM方法结合了具有专门近似方法的三重惯性加速策略,促进了对每个子问题优化中不同项的有针对性加速。这种整合提高了收敛效率,以更少的迭代次数实现了更优的性能。此外,我们提供了TIAM算法的收敛性分析,包括其全局收敛性质和收敛速率。大量实验验证了TIAM方法的有效性,与现有方法相比,特别是在应用于修正线性单元(ReLU)及其变体时,显示出泛化能力和计算效率的显著提升。