Optimization is an integral part of modern deep learning. Recently, the concept of learned optimizers has emerged as a way to accelerate this optimization process by replacing traditional, hand-crafted algorithms with meta-learned functions. Despite the initial promising results of these methods, issues with stability and generalization still remain, limiting their practical use. Moreover, their inner workings and behavior under different conditions are not yet fully understood, making it difficult to come up with improvements. For this reason, our work examines their optimization trajectories from the perspective of network architecture symmetries and parameter update distributions. Furthermore, by contrasting the learned optimizers with their manually designed counterparts, we identify several key insights that demonstrate how each approach can benefit from the strengths of the other.
翻译:优化是现代深度学习不可或缺的组成部分。近年来,学习型优化器的概念崭露头角,通过用元学习函数替代传统手工设计的算法,有望加速优化过程。尽管这些方法最初取得了令人鼓舞的成果,但稳定性和泛化能力方面的问题依然存在,限制了其实用性。此外,它们在不同条件下的内部工作机制和行为尚未完全明晰,使得改进工作难以推进。为此,本研究从网络架构对称性和参数更新分布的角度审视其优化轨迹。同时,通过将学习型优化器与手工设计的优化器进行对比,我们识别出若干关键见解,揭示了每种方法如何能够借鉴对方的优势。