We lay the theoretical foundation for automating optimizer design in gradient-based learning. Based on the greedy principle, we formulate the problem of designing optimizers and their hyperparameters as maximizing the instantaneous decrease in loss. By treating an optimizer as a function that translates loss gradient signals into parameter motions, the problem reduces to a family of convex optimization problems over the space of optimizers. Solving these problems under various constraints not only recovers a wide range of popular optimizers as closed-form solutions, but also produces the optimal hyperparameters of these optimizers with respect to the problems at hand. This enables a systematic approach to design optimizers and tune their hyperparameters dynamically according to the gradient statistics that are collected during the training process. Furthermore, this optimization of optimization can be performed dynamically during training.
翻译:我们为基于梯度的学习中自动化优化器设计奠定了理论基础。基于贪心原则,我们将设计优化器及其超参数的问题形式化为最大化损失的瞬时下降。通过将优化器视为将损失梯度信号转换为参数运动的函数,该问题可简化为在优化器空间上的一系列凸优化问题。在不同约束条件下求解这些问题,不仅能够以闭式解的形式恢复多种流行优化器,还能针对当前问题得出这些优化器的最优超参数。这为实现系统化设计优化器,并根据训练过程中收集的梯度统计信息动态调整其超参数提供了方法。此外,这种对优化的优化可在训练过程中动态执行。