We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs). This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure - a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments. The code has been released in https://github.com/radarFudan/Curse-of-memory
翻译:我们证明了使用递归神经网络(RNN)逼近非线性序列到序列关系的逆逼近定理。这是逼近理论中一类所谓的伯恩斯坦型结果,它基于目标函数可被假设空间有效逼近的假设推断其性质。特别地,我们证明能够被非线性RNN稳定逼近的非线性序列关系必然具有指数衰减的记忆结构——这一概念可以被精确刻画。这将对线性RNN中先前识别的“记忆诅咒”推广至一般非线性情形,并量化了RNN架构在学习长程记忆序列关系时的本质局限性。基于该分析,我们提出一种原则性的重参数化方法来克服这些局限。数值实验验证了我们的理论结果。代码已开源在 https://github.com/radarFudan/Curse-of-memory