The trade-off between performance and computational efficiency in long-sequence modeling becomes a bottleneck for existing models. Inspired by the continuous state space models (SSMs) with multi-input and multi-output in control theory, we propose a new neural network called Linear Dynamics-embedded Neural Network (LDNN). SSMs' continuous, discrete, and convolutional properties enable LDNN to have few parameters, flexible inference, and efficient training in long-sequence tasks. Two efficient strategies, diagonalization and $'\text{Disentanglement then Fast Fourier Transform (FFT)}'$, are developed to reduce the time complexity of convolution from $O(LNH\max\{L, N\})$ to $O(LN\max \{H, \log L\})$. We further improve LDNN through bidirectional noncausal and multi-head settings to accommodate a broader range of applications. Extensive experiments on the Long Range Arena (LRA) demonstrate the effectiveness and state-of-the-art performance of LDNN.
翻译:长序列建模中性能与计算效率之间的权衡成为现有模型的瓶颈。受控制理论中具有多输入多输出的连续状态空间模型(SSMs)启发,我们提出了一种新型神经网络——线性动力学嵌入神经网络(LDNN)。SSMs的连续、离散及卷积特性使得LDNN在长序列任务中具有参数少、推理灵活、训练高效等优势。我们开发了两种高效策略:对角化方法与"解耦后快速傅里叶变换(FFT)",将卷积时间复杂度从$O(LNH\max\{L, N\})$降低至$O(LN\max \{H, \log L\})$。通过双向非因果设置与多头机制进一步改进LDNN,以适配更广泛的应用场景。在长距离竞技场(LRA)上的大量实验表明,LDNN具有有效性且达到了最优性能。