We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention being local is a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 1.6 to 2 times. We also provide a mathematical definition of FWin attention, and prove that it is equivalent to the canonical full attention under the block diagonal invertibility (BDI) condition of the attention matrix. The BDI is shown experimentally to hold with high probability for typical benchmark datasets.
翻译:我们研究了一种快速的局部-全局窗口注意力方法,用于加速Informer在长序列时间序列预测中的表现。尽管窗口注意力因其局部性显著降低了计算量,但缺乏捕捉全局token信息的能力,而这一缺陷由后续的傅里叶变换模块弥补。我们的方法名为FWin,它不依赖于Informer中ProbSparse注意力所基于的查询稀疏性假设及经验近似。通过对单变量和多变量数据集的实验,我们证明FWin Transformer在提升Informer整体预测精度的同时,将其推理速度提高了1.6至2倍。我们还提供了FWin注意力的数学定义,并证明在注意力矩阵满足块对角可逆性条件时,它等价于标准全注意力。实验表明,对于典型基准数据集,块对角可逆性以高概率成立。