Time-series data in real-world medical settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In such contexts, traditional sequence-based recurrent models struggle. To overcome this, researchers replace recurrent architectures with Neural ODE-based models to model irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of moderate lengths and greater. To mitigate this, we introduce the Rough Transformer, a variation of the Transformer model which operates on continuous-time representations of input sequences and incurs significantly reduced computational costs, critical for addressing long-range dependencies common in medical contexts. In particular, we propose multi-view signature attention, which uses path signatures to augment vanilla attention and to capture both local and global dependencies in input data, while remaining robust to changes in the sequence length and sampling frequency. We find that Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the benefits of Neural ODE-based models using a fraction of the computational time and memory resources on synthetic and real-world time-series tasks.
翻译:真实医疗环境中的时间序列数据通常表现出长程依赖关系,且观测时间间隔不规则。在此类情境下,传统基于序列的循环模型难以胜任。为克服这一局限,研究人员用基于神经常微分方程的模型替代循环架构来处理非均匀采样数据,并采用基于Transformer的架构来建模长程依赖。尽管这两种方法取得了成功,但对于中等长度及更长的输入序列而言,两者都会带来极高的计算成本。为缓解这一问题,我们引入了粗鲁变换器(Rough Transformer)——一种Transformer模型的变体,它对输入序列的连续时间表示进行操作,显著降低了计算成本,这对解决医疗场景中常见的长期依赖问题至关重要。具体而言,我们提出了多视角签名注意力机制,该机制利用路径签名增强标准注意力,既能捕获输入数据中的局部依赖关系也能捕获全局依赖关系,同时对序列长度和采样频率的变化保持鲁棒性。我们发现,在合成时间序列任务和真实世界时间序列任务中,粗鲁变换器在计算时间和内存资源仅消耗一小部分的情况下,持续优于其标准注意力对应模型,同时获得了基于神经常微分方程模型的优势。