Transformers have supplanted Recurrent Neural Networks as the dominant architecture for both natural language processing tasks and, despite criticisms of cognitive implausibility, for modelling the effect of predictability on online human language comprehension. However, two recently developed recurrent neural network architectures, RWKV and Mamba, appear to perform natural language tasks comparably to or better than transformers of equivalent scale. In this paper, we show that contemporary recurrent models are now also able to match - and in some cases, exceed - performance of comparably sized transformers at modeling online human language comprehension. This suggests that transformer language models are not uniquely suited to this task, and opens up new directions for debates about the extent to which architectural features of language models make them better or worse models of human language comprehension.
翻译:Transformer已取代循环神经网络,成为自然语言处理任务的主导架构,并(尽管存在认知合理性不足的批评)被用于建模可预测性对在线人类语言理解的影响。然而,两种近期开发的循环神经网络架构——RWKV与Mamba——在执行自然语言任务时表现与同等规模Transformer相当甚至更优。本文表明,当代递归模型在建模在线人类语言理解方面,现已能匹配甚至超越同等规模Transformer的表现。这表明Transformer语言模型并非对此任务具有独特适应性,并为探讨语言模型架构特征在何种程度上使其成为更好或更差的人类语言理解模型开辟了新方向。