Transformers have surpassed RNNs in popularity due to their superior abilities in parallel training and long-term dependency modeling. Recently, there has been a renewed interest in using linear RNNs for efficient sequence modeling. These linear RNNs often employ gating mechanisms in the output of the linear recurrence layer while ignoring the significance of using forget gates within the recurrence. In this paper, we propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN), which includes forget gates that are lower bounded by a learnable value. The lower bound increases monotonically when moving up layers. This allows the upper layers to model long-term dependencies and the lower layers to model more local, short-term dependencies. Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and effectiveness of our proposed model. The source code is available at https://github.com/OpenNLPLab/HGRN.
翻译:Transformer因其在并行训练和长程依赖建模方面的优越能力,如今已超越RNN(循环神经网络)而广受欢迎。近期,学界重新关注使用线性RNN进行高效序列建模。这类线性RNN常在线性循环层的输出端采用门控机制,却忽略了在循环内部使用遗忘门的重要性。本文提出一种名为层次门控循环神经网络(HGRN)的门控线性RNN模型,该模型包含由可学习值作为下界的遗忘门。该下界随层数上移单调递增,使得高层网络能够建模长程依赖,而低层网络则侧重建模局部短程依赖。在语言建模、图像分类及长程竞技场(Long-Range Arena)基准测试上的实验表明,所提模型兼具高效性与有效性。源代码已发布于https://github.com/OpenNLPLab/HGRN。