Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

We analyze recurrent neural networks trained with gradient descent in the supervised learning setting for dynamical systems, and prove that gradient descent can achieve optimality \emph{without} massive overparameterization. Our in-depth nonasymptotic analysis (i) provides sharp bounds on the network size $m$ and iteration complexity $\tau$ in terms of the sequence length $T$, sample size $n$ and ambient dimension $d$, and (ii) identifies the significant impact of long-term dependencies in the dynamical system on the convergence and network width bounds characterized by a cutoff point that depends on the Lipschitz continuity of the activation function. Remarkably, this analysis reveals that an appropriately-initialized recurrent neural network trained with $n$ samples can achieve optimality with a network size $m$ that scales only logarithmically with $n$. This sharply contrasts with the prior works that require high-order polynomial dependency of $m$ on $n$ to establish strong regularity conditions. Our results are based on an explicit characterization of the class of dynamical systems that can be approximated and learned by recurrent neural networks via norm-constrained transportation mappings, and establishing local smoothness properties of the hidden state with respect to the learnable parameters.

翻译：本文分析在动力系统监督学习场景下使用梯度下降训练的循环神经网络，并证明梯度下降能够无需过度参数化即可实现最优性。我们的深度非渐近分析（i）提供了网络规模$m$与迭代复杂度$\tau$关于序列长度$T$、样本量$n$及环境维度$d$的精确界，并（ii）识别了动力系统中长期依赖对收敛性与网络宽度界的显著影响——这种影响通过由激活函数Lipschitz连续性决定的截断点来刻画。值得注意的是，该分析表明：经过适当初始化的循环神经网络在使用$n$个样本训练时，仅需网络规模$m$随$n$呈对数规模增长即可实现最优性。这与先前需要$m$对$n$满足高阶多项式依赖以建立强正则条件的研究形成鲜明对比。我们的结果基于对循环神经网络通过范数约束传输映射可逼近与学习的一类动力系统的显式刻画，以及隐状态关于可学习参数的局部光滑性建立。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日