Recurrent Neural Networks (RNNs) have achieved great success in the prediction of sequential data. However, their theoretical studies are still lagging behind because of their complex interconnected structures. In this paper, we establish a new generalization error bound for vanilla RNNs, and provide a unified framework to calculate the Rademacher complexity that can be applied to a variety of loss functions. When the ramp loss is used, we show that our bound is tighter than the existing bounds based on the same assumptions on the Frobenius and spectral norms of the weight matrices and a few mild conditions. Our numerical results show that our new generalization bound is the tightest among all existing bounds in three public datasets. Our bound improves the second tightest one by an average percentage of 13.80% and 3.01% when the $\tanh$ and ReLU activation functions are used, respectively. Moreover, we derive a sharp estimation error bound for RNN-based estimators obtained through empirical risk minimization (ERM) in multi-class classification problems when the loss function satisfies a Bernstein condition.
翻译:循环神经网络(RNNs)在序列数据预测方面取得了巨大成功。然而,由于其复杂的互联结构,其理论研究仍相对滞后。本文为经典RNNs建立了一个新的泛化误差界,并提供了一个统一框架来计算Rademacher复杂度,该框架可适用于多种损失函数。当使用斜坡损失时,我们证明在权重矩阵的Frobenius范数与谱范数相同假设以及若干温和条件下,所得界比现有界更紧。数值结果表明,在三个公开数据集上,我们的新泛化界是所有现有界中最紧的。使用$\tanh$与ReLU激活函数时,我们的界分别平均比第二紧的界改进了13.80%与3.01%。此外,对于多分类问题中通过经验风险最小化(ERM)得到的RNN估计量,当损失函数满足Bernstein条件时,我们推导出了一个尖锐的估计误差界。