Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians

Training recurrent neural networks (RNNs) remains a challenge due to the instability of gradients across long time horizons, which can lead to exploding and vanishing gradients. Recent research has linked these problems to the values of Lyapunov exponents for the forward-dynamics, which describe the growth or shrinkage of infinitesimal perturbations. Here, we propose gradient flossing, a novel approach to tackling gradient instability by pushing Lyapunov exponents of the forward dynamics toward zero during learning. We achieve this by regularizing Lyapunov exponents through backpropagation using differentiable linear algebra. This enables us to "floss" the gradients, stabilizing them and thus improving network training. We demonstrate that gradient flossing controls not only the gradient norm but also the condition number of the long-term Jacobian, facilitating multidimensional error feedback propagation. We find that applying gradient flossing prior to training enhances both the success rate and convergence speed for tasks involving long time horizons. For challenging tasks, we show that gradient flossing during training can further increase the time horizon that can be bridged by backpropagation through time. Moreover, we demonstrate the effectiveness of our approach on various RNN architectures and tasks of variable temporal complexity. Additionally, we provide a simple implementation of our gradient flossing algorithm that can be used in practice. Our results indicate that gradient flossing via regularizing Lyapunov exponents can significantly enhance the effectiveness of RNN training and mitigate the exploding and vanishing gradient problem.

翻译：训练循环神经网络（RNN）仍面临挑战，原因是跨长时间尺度的梯度不稳定性会导致梯度爆炸和梯度消失。近期研究将这些现象与前向动力学的李雅普诺夫指数联系起来，该指数描述了无穷小扰动的增长或收缩。本文提出梯度清洁（gradient flossing），一种通过在学习过程中将前向动力学的李雅普诺夫指数推向零来应对梯度不稳定性的新方法。我们利用可微线性代数通过反向传播正则化李雅普诺夫指数，从而实现这一目标。这使得我们能够"清洁"梯度，稳定梯度并改善网络训练。我们证明梯度清洁不仅控制梯度范数，还能控制长期雅可比矩阵的条件数，促进多维误差反馈传播。我们发现，在训练前应用梯度清洁可提升涉及长时间尺度任务的成功率和收敛速度。对于具有挑战性的任务，我们展示训练期间的梯度清洁能进一步增加可通过时间反向传播连接的时间跨度。此外，我们在多种RNN架构和不同时间复杂度的任务上验证了该方法有效性。同时，我们提供了梯度清洁算法的简洁实现方案。研究结果表明，通过正则化李雅普诺夫指数的梯度清洁能显著提升RNN训练效果，缓解梯度爆炸与梯度消失问题。