In recurrent neural networks, learning long-term dependency is the main difficulty due to the vanishing and exploding gradient problem. Many researchers are dedicated to solving this issue and they proposed many algorithms. Although these algorithms have achieved great success, understanding how the information decays remains an open problem. In this paper, we study the dynamics of the hidden state in recurrent neural networks. We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix. We start the analysis by linear state space model and explain the function of preserving information in activation functions. We provide an explanation for long-term dependency based on the eigen analysis. We also point out the different behavior of eigenvalues for regression tasks and classification tasks. From the observations on well-trained recurrent neural networks, we proposed a new initialization method for recurrent neural networks, which improves consistently performance. It can be applied to vanilla-RNN, LSTM, and GRU. We test on many datasets, such as Tomita Grammars, pixel-by-pixel MNIST datasets, and machine translation datasets (Multi30k). It outperforms the Xavier initializer and kaiming initializer as well as other RNN-only initializers like IRNN and sp-RNN in several tasks.
翻译:在循环神经网络中,由于梯度消失和梯度爆炸问题,学习长期依赖关系是主要难点。许多研究者致力于解决这一问题并提出了多种算法。尽管这些算法取得了巨大成功,但理解信息如何衰减仍是一个开放性问题。本文对循环神经网络中隐藏状态的动态特性进行了研究。我们提出了一种基于权重矩阵特征分解来解析隐藏状态空间的新视角。通过线性状态空间模型展开分析,并阐释了激活函数中信息保持的功能。基于特征分析为长期依赖关系提供了解释,同时指出了回归任务与分类任务中特征值的不同行为。基于对训练完善的循环神经网络的观察,我们提出了一种新的循环神经网络初始化方法,该方法能够持续提升性能,并可应用于原始RNN、LSTM和GRU。我们在多种数据集上进行了测试,包括Tomita文法、逐像素MNIST数据集和机器翻译数据集(Multi30k)。该方法在多个任务中优于Xavier初始化器、Kaiming初始化器以及IRNN和sp-RNN等仅适用于RNN的初始化器。