To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention which we demonstrate in time-series reconstruction and prediction. While the standard self attention only makes use of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through the singular-value decomposition (SVD) on the softmax attention score, we further observe that self attention compresses the contributions from both queries and keys in the space spanned by the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than self attention or the widely-used long short-term memory (LSTM) network. We show the improved performance of the easy-attention method in the Lorenz system, a turbulence shear flow and a model of a nuclear reactor.
翻译:为提升用于混沌系统时间动力学预测的Transformer神经网络的鲁棒性,我们提出一种名为“简易注意力”(easy attention)的新型注意力机制,并在时间序列重构与预测中验证其有效性。尽管标准自注意力仅利用查询(queries)与键(keys)的内积,但研究表明,为捕获时间序列中的长期依赖关系,键、查询及softmax并非获取注意力得分的必要条件。通过对softmax注意力得分进行奇异值分解(SVD),我们进一步发现自注意力在注意力得分张成的空间中压缩了查询和键的贡献。因此,所提出的简易注意力方法直接将注意力得分视为可学习参数。该方法在重构和预测混沌系统时间动力学时表现优异,相较于自注意力或广泛使用的长短期记忆网络(LSTM),展现出更强的鲁棒性和更低的复杂度。我们在洛伦兹系统、湍流剪切流及核反应堆模型中展示了简易注意力方法的性能提升。