Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies, making them highly prevalent in sequential data processing applications. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor computational efficiency and network generalization. To address these challenges, this paper proposes a novel Delayed Memory Unit (DMU). The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment. Specifically, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.
翻译:循环神经网络(RNN)因其在建模时间依赖性方面的卓越能力而广受认可,使其在序列数据处理应用中极为普遍。然而,朴素RNN面临着众所周知的梯度消失和爆炸问题,这对学习和建立长程依赖关系构成了重大挑战。此外,门控RNN往往存在参数过多的问题,导致计算效率低下和网络泛化能力不佳。为了应对这些挑战,本文提出了一种新颖的延迟记忆单元(DMU)。DMU将延迟线结构及延迟门引入朴素RNN,从而增强了时间交互并促进了时间信用分配。具体而言,DMU旨在将输入信息直接分配到未来的最优时间点,而不是通过复杂的网络动力学在时间上聚合并重新分配信息。我们提出的DMU在广泛的序列建模任务中展现出卓越的时间建模能力,在语音识别、雷达手势识别、心电图波形分割以及置换序列图像分类等应用中,所使用的参数数量远少于其他先进的门控RNN模型。