Toeplitz Neural Networks (TNNs) have exhibited outstanding performance in various sequence modeling tasks. They outperform commonly used Transformer-based models while benefiting from log-linear space-time complexities. On the other hand, State Space Models (SSMs) achieve lower performance than TNNs in language modeling but offer the advantage of constant inference complexity. In this paper, we aim to combine the strengths of TNNs and SSMs by converting TNNs to SSMs during inference, thereby enabling TNNs to achieve the same constant inference complexities as SSMs. To accomplish this, we formulate the conversion process as an optimization problem and provide a closed-form solution. We demonstrate how to transform the target equation into a Vandermonde linear system problem, which can be efficiently solved using the Discrete Fourier Transform (DFT). Notably, our method requires no training and maintains numerical stability. It can be also applied to any LongConv-based model. To assess its effectiveness, we conduct extensive experiments on language modeling tasks across various settings. Additionally, we compare our method to other gradient-descent solutions, highlighting the superior numerical stability of our approach. The source code is available at https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion.
翻译:托普利茨神经网络(TNNs)已在多种序列建模任务中展现出卓越性能。它们不仅优于常用的基于Transformer的模型,还具备对数线性时空复杂度的优势。另一方面,状态空间模型(SSMs)在语言建模任务中性能低于TNNs,但具有恒定推理复杂度的优势。本文旨在通过将TNNs在推理阶段转换为SSMs,融合两者的优势,使TNNs实现与SSMs相同的恒定推理复杂度。为此,我们将转换过程形式化为一个优化问题,并给出闭式解。我们展示了如何将目标方程转化为范德蒙德线性系统问题,并利用离散傅里叶变换(DFT)高效求解。值得注意的是,我们的方法无需训练且保持数值稳定性,还可应用于任何基于LongConv的模型。为评估其有效性,我们在不同设置的语言建模任务上进行了大量实验。此外,我们将其与梯度下降类方法对比,凸显了本方法在数值稳定性上的优越性。源代码已开源至https://github.com/OpenNLPLab/ETSC-Exact-Toeplitz-to-SSM-Conversion。