Linear-time attention and State Space Models (SSMs) promise to solve the quadratic cost bottleneck in long-context language models employing softmax attention. We introduce Error-Free Linear Attention (EFLA), a numerically stable, fully parallelism and generalized formulation of the delta rule. Specifically, we formulate the online learning update as a continuous-time dynamical system and prove that its exact solution is not only attainable but also computable in linear time with full parallelism. By leveraging the rank-1 structure of the dynamics matrix, we directly derive the exact closed-form solution effectively corresponding to the infinite-order Runge-Kutta method. This attention mechanism is theoretically free from error accumulation, perfectly capturing the continuous dynamics while preserving the linear-time complexity. Through an extensive suite of experiments, we show that EFLA enables robust performance in noisy environments, achieving lower language modeling perplexity and superior downstream benchmark performance than DeltaNet without introducing additional parameters. Our work provides a new theoretical foundation for building high-fidelity, scalable linear-time attention models.
翻译:线性时间注意力与状态空间模型(SSMs)有望解决采用softmax注意力的长上下文语言模型中的二次成本瓶颈。我们提出无误差线性注意力(EFLA),这是一种数值稳定、完全并行化且广义化的delta规则表述。具体而言,我们将在线学习更新建模为连续时间动力系统,并证明其精确解不仅可达到,而且能以线性时间与完全并行化方式计算。通过利用动力学矩阵的秩-1结构,我们直接推导出有效对应于无限阶龙格-库塔方法的精确闭式解。该注意力机制理论上无误差累积,完美捕捉连续动力学特性,同时保持线性时间复杂度。通过大量实验,我们证明EFLA在噪声环境中具有鲁棒性能,在不引入额外参数的情况下,相比DeltaNet实现了更低的语言建模困惑度与更优的下游基准性能。我们的工作为构建高保真、可扩展的线性时间注意力模型提供了新的理论基础。