Within the current sphere of deep learning research, despite the extensive application of optimization algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), there remains a pronounced inadequacy in their capability to address fluctuations in learning efficiency, meet the demands of complex models, and tackle non-convex optimization issues. These challenges primarily arise from the algorithms' limitations in handling complex data structures and models, for instance, difficulties in selecting an appropriate learning rate, avoiding local optima, and navigating through high-dimensional spaces. To address these issues, this paper introduces a novel optimization algorithm named DWMGrad. This algorithm, building on the foundations of traditional methods, incorporates a dynamic guidance mechanism reliant on historical data to dynamically update momentum and learning rates. This allows the optimizer to flexibly adjust its reliance on historical information, adapting to various training scenarios. This strategy not only enables the optimizer to better adapt to changing environments and task complexities but also, as validated through extensive experimentation, demonstrates DWMGrad's ability to achieve faster convergence rates and higher accuracies under a multitude of scenarios.
翻译:在当前深度学习研究领域,尽管随机梯度下降(SGD)和自适应矩估计(Adam)等优化算法已得到广泛应用,但其在处理学习效率波动、满足复杂模型需求以及应对非凸优化问题方面仍存在明显不足。这些挑战主要源于算法在处理复杂数据结构和模型时的局限性,例如难以选择合适的学习率、避免局部最优解以及在高维空间中有效导航。为解决这些问题,本文提出了一种名为DWMGrad的新型优化算法。该算法在传统方法基础上,引入了一种基于历史数据的动态引导机制,以动态更新动量与学习率。这使得优化器能够灵活调整对历史信息的依赖程度,从而适应不同的训练场景。该策略不仅使优化器能更好地适应动态变化的环境与任务复杂度,且经大量实验验证表明,DWMGrad在多种场景下均能实现更快的收敛速度与更高的精度。