We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich structure of parameters contained within their optimization trajectories. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which reveal the inherent nuance and interplay involved between various optimization choices, such as momentum, weight decay, and batch size. We use them to provide key hallmarks about the nature of optimization in deep neural networks: when it goes right, and when it finds itself in a dead end. Further, thanks to our trajectory perspective, we uncover an intertwined behaviour of momentum and weight decay that promotes directional exploration, as well as a directional regularization behaviour of some others. We perform experiments over large-scale vision and language settings, including large language models (LLMs) with up to 12 billion parameters, to demonstrate the value of our approach.
翻译:我们提出了一种理解神经网络机制的新视角,通过分析优化轨迹中参数的丰富结构。为此,我们引入了优化轨迹复杂性的若干自然概念(包括定性与定量指标),揭示了动量、权重衰减和批大小等优化选择之间固有的微妙关联与相互作用。我们利用这些概念提供了深度神经网络优化本质的关键特征:何时优化顺利,何时陷入死胡同。进一步地,基于轨迹视角,我们发现动量与权重衰减之间存在一种促进方向探索的协同行为,以及某些选择具有方向正则化效应。我们在大规模视觉和语言任务上进行了实验,包括参数量高达120亿的大语言模型,以证明我们方法的有效性。