The Rotary Position Embedding (RoPE) mechanism has become a powerful enhancement to the Transformer architecture, which enables models to capture token relationships when encoding positional information. However, the RoPE mechanisms make the computations of attention mechanisms more complicated, which makes efficient algorithms challenging. Earlier research introduced almost linear time algorithms for the forward computation under specific parameter settings of bounded entries (i.e., in time $n^{1+o(1)}$ where $n$ is the number of input tokens), but has not addressed backward computation. In this work, we develop the first almost linear time algorithm for backward computations in the RoPE-based attention under bounded entries. Our approach builds on recent advancements in fast RoPE attention computations, utilizing a novel combination of the polynomial method and the Fast Fourier Transform. Furthermore, we show that with lower bounds derived from the Strong Exponential Time Hypothesis (SETH), the bounded entry condition is necessary for subquadratic performance.
翻译:旋转位置编码(RoPE)机制已成为Transformer架构的重要增强手段,其能够在编码位置信息时有效捕捉词元间关系。然而,RoPE机制使得注意力计算过程更为复杂,从而对高效算法设计提出了挑战。先前研究在特定有界参数设定下(即时间复杂度为$n^{1+o(1)}$,其中$n$为输入词元数量)提出了前向计算的近似线性时间算法,但尚未解决反向计算问题。本研究首次针对基于RoPE的注意力机制,在有界参数条件下开发出近似线性时间的反向计算算法。我们的方法基于快速RoPE注意力计算的最新进展,通过多项式方法与快速傅里叶变换的创新性结合实现算法构建。此外,我们通过强指数时间假设(SETH)导出的下界证明,有界参数条件是实现次二次计算性能的必要前提。