Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.
翻译:序列模型,如循环神经网络和神经常微分方程,因其固有的序列特性长期面临训练缓慢的问题。多年来这一瓶颈持续存在,许多人认为序列模型无法并行化。我们通过提出的并行算法挑战了这一长期信念,该算法在不牺牲输出精度的前提下,将序列模型的GPU评估速度提升了多达三个数量级。该算法无需序列模型架构具有特殊结构,因此可广泛应用于各类架构。采用我们的方法,序列模型的训练速度可比常规序列方法快10倍以上,且训练结果无显著差异。借助这种加速训练,我们发现了门控循环单元在包含17k时间样本的长时间序列分类问题中的有效性。通过突破训练瓶颈,我们的工作为解锁非线性序列模型在长序列问题中的潜力迈出了第一步。