Transformers excel at time series modelling through attention mechanisms that capture long-term temporal patterns. However, they assume uniform time intervals and therefore struggle with irregular time series. Neural Ordinary Differential Equations (NODEs) effectively handle irregular time series by modelling hidden states as continuously evolving trajectories. ContiFormers arxiv:2402.10635 combine NODEs with Transformers, but inherit the computational bottleneck of the former by using heavy numerical solvers. This bottleneck can be removed by using a closed-form solution for the given dynamical system - but this is known to be intractable in general! We obviate this by replacing NODEs with a novel linear damped harmonic oscillator analogy - which has a known closed-form solution. We model keys and values as damped, driven oscillators and expand the query in a sinusoidal basis up to a suitable number of modes. This analogy naturally captures the query-key coupling that is fundamental to any transformer architecture by modelling attention as a resonance phenomenon. Our closed-form solution eliminates the computational overhead of numerical ODE solvers while preserving expressivity. We prove that this oscillator-based parameterisation maintains the universal approximation property of continuous-time attention; specifically, any discrete attention matrix realisable by ContiFormer's continuous keys can be approximated arbitrarily well by our fixed oscillator modes. Our approach delivers both theoretical guarantees and scalability, achieving state-of-the-art performance on irregular time series benchmarks while being orders of magnitude faster.
翻译:Transformer通过捕捉长期时间模式的注意力机制在时间序列建模中表现卓越。然而,其假设时间间隔均匀,因此难以处理不规则时间序列。神经常微分方程通过将隐藏状态建模为连续演化的轨迹,能有效处理不规则时间序列。ContiFormer(arxiv:2402.10635)将NODE与Transformer结合,但继承了前者使用重型数值求解器的计算瓶颈。该瓶颈可通过采用给定动力系统的闭式解消除——但这在一般情况下被认为是不可行的!我们通过用新型线性阻尼谐振子类比替代NODE来规避此问题——该类比具有已知闭式解。我们将键和值建模为阻尼受迫振荡器,并将查询在正弦基上展开至适当模态数。该类比通过将注意力建模为共振现象,自然捕捉了任何Transformer架构基础的查询-键耦合机制。我们的闭式解在保持表达力的同时,消除了数值ODE求解器的计算开销。我们证明这种基于振荡器的参数化保持了连续时间注意力的通用逼近性质;具体而言,ContiFormer连续键可实现的所有离散注意力矩阵,均可通过我们固定的振荡器模态任意逼近。我们的方法兼具理论保证与可扩展性,在不规则时间序列基准测试中达到最先进性能,同时计算速度提升数个数量级。