We observe a novel `multiple-descent' phenomenon during the learning process of a recurrent neural network called long-short-term memory (LSTM) networks during its training on real-world task, in which the performance goes through long cycles of up and down trends multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in performance -- indicated by loss function in test data -- are closely associated with the phase transition process between order and chaos of the model, and the local optimal training step are consistently at the critical transition point between the two phases. More importantly, the most optimal point of the model usually occurs at the first transition from order to chaos, where the `width' of the `edge of chaos' is often the widest, allowing the best exploration of weight configurations for learning.
翻译:我们在实际任务中训练长短期记忆(LSTM)循环神经网络时,观察到一种新颖的“多重下降”现象:模型在过训练后,其性能会经历多次长周期的升降波动。通过对模型进行渐近稳定性分析,我们发现性能波动(由测试数据中的损失函数反映)与模型在有序与混沌之间的相变过程密切相关,且局部最优训练步长始终位于两相之间的临界转变点。更重要的是,模型的最优点通常出现在首次从有序向混沌的转变过程中,此时“混沌边缘”的“宽度”往往最大,从而允许对权重配置进行最佳探索以促进学习。