Conventional self-attention mechanisms incur quadratic complexity, limiting their scalability on long sequences. We introduce FFTNet, an adaptive spectral filtering framework that leverages the Fast Fourier Transform (FFT) to achieve global token mixing in $\mathcal{O}(n\log n)$ time. By transforming inputs into the frequency domain, FFTNet exploits the orthogonality and energy preservation guaranteed by Parseval's theorem to capture long-range dependencies efficiently. A learnable spectral filter and modReLU activation dynamically emphasize salient frequency components, providing a rigorous and adaptive alternative to traditional self-attention. Experiments on the Long Range Arena and ImageNet benchmarks validate our theoretical insights and demonstrate superior performance over fixed Fourier and standard attention models.
翻译:传统的自注意力机制具有二次复杂度,限制了其在长序列上的可扩展性。我们提出了FFTNet,一种自适应谱滤波框架,它利用快速傅里叶变换(FFT)在$\mathcal{O}(n\log n)$时间内实现全局令牌混合。通过将输入转换到频域,FFTNet利用帕塞瓦尔定理保证的正交性和能量守恒特性,高效捕获长程依赖关系。一个可学习的谱滤波器和modReLU激活函数动态地强调显著频率分量,为传统的自注意力机制提供了一个严谨且自适应的替代方案。在Long Range Arena和ImageNet基准测试上的实验验证了我们的理论见解,并证明了其性能优于固定傅里叶方法和标准的注意力模型。