Two of the many trends in neural network research of the past few years have been (i) the learning of dynamical systems, especially with recurrent neural networks such as long short-term memory networks (LSTMs) and (ii) the introduction of transformer neural networks for natural language processing (NLP) tasks. While some work has been performed on the intersection of these two trends, those efforts were largely limited to using the vanilla transformer directly without adjusting its architecture for the setting of a physical system. In this work we develop a transformer-inspired neural network and use it to learn a dynamical system. We (for the first time) change the activation function of the attention layer to imbue the transformer with structure-preserving properties to improve long-term stability. This is shown to be of great advantage when applying the neural network to learning the trajectory of a rigid body.
翻译:过去几年神经网络研究中的两大趋势分别是:(i) 动态系统的学习,特别是通过长短期记忆网络(LSTM)等循环神经网络;(ii) 为自然语言处理(NLP)任务引入Transformer神经网络。尽管已有研究尝试融合这两大趋势,但这些工作大多局限于直接使用原始Transformer架构,未针对物理系统特性调整其结构。本研究提出一种受Transformer启发的神经网络,并将其应用于动态系统学习。我们首次通过改变注意力层的激活函数,使Transformer具备结构保持特性,从而提升长期稳定性。该方法在神经网络学习刚体轨迹任务中展现出显著优势。