Conventional nonlinear RNNs are not naturally parallelizable across the sequence length, unlike transformers and linear RNNs. Lim et. al. (2024) therefore tackle parallelized evaluation of nonlinear RNNs, posing it as a fixed point problem solved with Newton's method. By deriving and applying a parallelized form of Newton's method, they achieve large speedups over sequential evaluation. However, their approach inherits cubic computational complexity and numerical instability. We tackle these weaknesses. To reduce the computational complexity, we apply quasi-Newton approximations and show they converge comparably, use less memory, and are faster, compared to full-Newton. To stabilize Newton's method, we leverage a connection between Newton's method damped with trust regions and Kalman smoothing. This connection allows us to stabilize the iteration, per the trust region, and use efficient parallelized Kalman algorithms to retain performance. We compare these methods empirically and highlight use cases where each algorithm excels.
翻译:传统的非线性循环神经网络(RNN)在序列长度上不具备天然的并行性,这与Transformer和线性RNN不同。因此,Lim等人(2024)致力于解决非线性RNN的并行化评估问题,将其表述为使用牛顿法求解的不动点问题。通过推导并应用一种并行化的牛顿法形式,他们在序列评估上实现了大幅加速。然而,该方法继承了立方级的计算复杂度与数值不稳定性。我们针对这些弱点进行改进。为降低计算复杂度,我们应用拟牛顿近似,并证明与完整牛顿法相比,其收敛性相当、内存使用更少且速度更快。为稳定牛顿方法,我们利用了带信赖域阻尼的牛顿法与卡尔曼平滑之间的联系。这一联系使我们能够根据信赖域稳定迭代过程,并利用高效的并行化卡尔曼算法以保持性能。我们通过实验比较这些方法,并重点展示每种算法表现优异的适用场景。