利用近似线性循环神经网络揭示非线性在序列建模中的计算作用 (Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs)

Sequence modeling tasks across domains such as natural language processing, time series forecasting, and control require learning complex input-output mappings. Nonlinear recurrence is theoretically required for universal approximation of sequence-to-sequence functions, yet linear recurrent models often prove surprisingly effective. This raises the question of when nonlinearity is truly required. We present a framework to systematically dissect the functional role of nonlinearity in recurrent networks, identifying when it is computationally necessary and what mechanisms it enables. We address this using Almost Linear Recurrent Neural Networks (AL-RNNs), which allow recurrence nonlinearity to be gradually attenuated and decompose network dynamics into analyzable linear regimes, making computational mechanisms explicit. We illustrate the framework across diverse synthetic and real-world tasks, including classic sequence modeling benchmarks, a neuroscientific stimulus-selection task, and a multi-task suite. We demonstrate how the AL-RNN's piecewise linear structure enables identification of computational primitives such as gating, rule-based integration, and memory-dependent transients, revealing that these operations emerge within predominantly linear backbones. Across tasks, sparse nonlinearity improves interpretability by reducing and localizing nonlinear computations, promotes shared representations in multi-task settings, and reduces computational cost. Moreover, sparse nonlinearity acts as a useful inductive bias: in low-data regimes or when tasks require discrete switching between linear regimes, sparsely nonlinear models often match or exceed fully nonlinear architectures. Our findings provide a principled approach for identifying where nonlinearity is functionally necessary, guiding the design of recurrent architectures that balance performance, efficiency, and interpretability.

翻译：在自然语言处理、时间序列预测和控制等领域的序列建模任务中，学习复杂的输入-输出映射是必需的。理论上，要实现序列到序列函数的通用逼近，非线性递归是必要的，然而线性循环模型往往表现出惊人的有效性。这引发了一个问题：何时真正需要非线性？我们提出了一个系统剖析循环网络中非线性功能作用的框架，以识别其在计算上的必要性及其所实现的机制。我们通过近似线性循环神经网络（AL-RNNs）来解决这一问题，该网络允许逐步衰减递归非线性，并将网络动态分解为可分析的线性机制，从而使计算机制变得显式。我们在多种合成和真实世界任务中展示了该框架的应用，包括经典的序列建模基准、神经科学刺激选择任务以及多任务套件。我们证明了AL-RNN的分段线性结构如何能够识别诸如门控、基于规则的整合和依赖于记忆的瞬态等计算原语，揭示了这些操作主要在线性骨干网络中涌现。在不同任务中，稀疏非线性通过减少并局部化非线性计算来提高可解释性，促进多任务设置中的共享表示，并降低计算成本。此外，稀疏非线性作为一种有用的归纳偏置：在低数据量情况下或当任务需要在不同线性机制之间进行离散切换时，稀疏非线性模型通常能够匹配甚至超越完全非线性架构。我们的研究结果为识别非线性在功能上的必要性提供了一种原则性方法，从而指导设计在性能、效率和可解释性之间取得平衡的循环架构。