The benefits of depth in feedforward neural networks are well known: composing multiple layers of linear transformations with nonlinear activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs' memory capacity efficiently with respect to the number of parameters, thus enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We broaden our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.
翻译:前馈神经网络中深度的优势众所周知:通过将多层线性变换与非线性激活函数组合,可实现复杂计算。尽管循环神经网络(RNN)中预期存在类似效应,但深度如何与循环交互以塑造表达能力仍不清晰。本文严格证明,深度可高效提升RNN在参数数量上的记忆容量,从而通过实现更复杂的输入变换和改善历史信息保留来增强表达能力。我们将分析拓展至2RNN——一种输入与隐藏状态间存在乘法交互的RNN泛化形式。不同于缺乏非线性激活时仍保持线性的标准RNN,2RNN执行最大阶数随深度增长的多项式变换。我们进一步证明,乘法交互通常无法被逐层非线性替代。最后,我们在合成任务和真实世界任务中实证验证了这些见解。