We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint. Specifically, we segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink (EOS), with each model having its own specific settings. The Expand stage involves projecting the input signal onto a high-dimensional memory state. This is followed by recursive operations performed on the memory state in the Oscillation stage. Finally, the memory state is projected back to a low-dimensional space in the Shrink stage. We perform comprehensive experiments to analyze the impact of different stage settings on language modeling and retrieval tasks. Our results show that data-driven methods are crucial for the effectiveness of the three stages in language modeling, whereas hand-crafted methods yield better performance in retrieval tasks.
翻译:我们提出了线性复杂度序列模型(LCSM),这是一个将多种具有线性复杂度的序列建模技术——包括线性注意力、状态空间模型、长卷积和线性RNN——统一在单一框架内的综合性解决方案。其目标是通过从统一且简化的视角分析各组成部分的影响,深化对这些模型的理解。具体而言,我们将这些模型的建模过程划分为三个不同的阶段:扩展、振荡与收缩(EOS),每个模型都有其特定的设置。扩展阶段涉及将输入信号投影到高维记忆状态。随后在振荡阶段对记忆状态执行递归操作。最后,在收缩阶段将记忆状态投影回低维空间。我们进行了全面的实验,以分析不同阶段设置对语言建模和检索任务的影响。我们的结果表明,在语言建模中,数据驱动方法对于这三个阶段的有效性至关重要,而在检索任务中,手工设计的方法则能带来更好的性能。