We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures.
翻译:本研究探讨了循环神经网络(RNNs)在学习时序数据输入输出关系时的逼近特性与优化动态。我们考虑一个简单但具有代表性的设定:使用连续时间线性RNN从线性关系生成的数据中学习。从数学角度,后者可被理解为一组线性泛函序列。我们证明了此类线性泛函的通用逼近定理,并刻画了逼近速率及其与记忆的关系。此外,我们对线性RNN的训练过程进行了细粒度的动力学分析,进一步揭示了记忆与学习之间复杂的相互作用。研究揭示的核心主题是记忆(这一概念可在我们的框架中被精确定义)对逼近与优化的非平凡影响:当目标存在长期记忆时,需要大量神经元才能实现有效逼近,且训练过程将遭遇速度减缓。特别值得注意的是,这两种效应均随记忆长度呈指数级加剧——我们称此现象为“记忆诅咒”。这些分析为具体理解循环架构在学习时序关系时可能出现的新现象奠定了数学基础。