Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent architectures with Neural ODE-based models to account for irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of even moderate length. To address this challenge, we introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences and incurs significantly lower computational costs. In particular, we propose \textit{multi-view signature attention}, which uses path signatures to augment vanilla attention and to capture both local and global (multi-scale) dependencies in the input data, while remaining robust to changes in the sequence length and sampling frequency and yielding improved spatial processing. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the representational benefits of Neural ODE-based models, all at a fraction of the computational time and memory resources.
翻译:现实世界中的时间序列数据通常表现出长程依赖性,且观测时间间隔不均匀。在此类场景下,传统的基于序列的循环模型往往难以应对。为解决这一问题,研究者通常采用基于神经ODE的模型来处理非均匀采样数据,并借助基于Transformer的架构来捕捉长程依赖关系。尽管这两种方法均取得了成功,但即使对于中等长度的输入序列,它们都会产生极高的计算成本。为应对这一挑战,我们提出了粗糙Transformer——一种基于输入序列连续时间表示的Transformer变体,其计算成本显著降低。具体而言,我们提出了*多视图签名注意力*机制,该机制利用路径签名增强原始注意力,从而捕捉输入数据中局部与全局(多尺度)的依赖关系,同时保持对序列长度和采样频率变化的鲁棒性,并实现更优的空间处理能力。实验表明,在多种时间序列相关任务中,粗糙Transformer在获得基于神经ODE模型的表征优势的同时,始终优于原始注意力机制的基线模型,且仅需消耗其计算时间与内存资源的极小部分。