We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using RNNs. This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships, viewed as functional sequences, that can be stably approximated by RNNs with hardtanh/tanh activations must have an exponential decaying memory structure -- a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments.
翻译:我们证明了使用循环神经网络逼近非线性序列到序列关系的一个反向逼近定理。这是逼近理论中所谓的伯恩斯坦型结果,它在目标函数能被假设空间有效逼近的假设下,推导出目标函数的性质。特别地,我们证明了那些能被具有hardtanh/tanh激活函数的循环神经网络稳定逼近的非线性序列关系(视为泛函序列)必然具有指数衰减的记忆结构——这一概念可以精确刻画。这扩展了先前在线性循环神经网络中识别的"记忆诅咒"到一般非线性设置,并量化了循环神经网络架构在学习具有长期记忆的序列关系时的本质局限性。基于该分析,我们提出了一种有原则性的重参数化方法来克服这些局限性。数值实验验证了我们的理论结果。