In this note (work in progress towards a full-length paper) we show that a family of sequence models based on recurrent linear layers~(including S4, S5, and the LRU) interleaved with position-wise multi-layer perceptrons~(MLPs) can approximate arbitrarily well any sufficiently regular non-linear sequence-to-sequence map. The main idea behind our result is to see recurrent layers as compression algorithms that can faithfully store information about the input sequence into an inner state, before it is processed by the highly expressive MLP.
翻译:本文(为完整论文的初稿)证明,一类基于递归线性层(包括S4、S5和LRU)并与逐位置多层感知机(MLP)交织的序列模型,能够以任意精度逼近任何足够规则的非线性序列到序列映射。本研究结果的核心思想在于,将递归层视为压缩算法,它能将输入序列的信息忠实地存储于内部状态中,再由高表达能力的MLP进行处理。