Language models that are trained on the next-word prediction task have been shown to accurately model human behavior in word prediction and reading speed. In contrast with these findings, we present a scenario in which the performance of humans and LMs diverges. We collected a dataset of human next-word predictions for five stimuli that are formed by repeating spans of text. Human and GPT-2 LM predictions are strongly aligned in the first presentation of a text span, but their performance quickly diverges when memory (or in-context learning) begins to play a role. We traced the cause of this divergence to specific attention heads in a middle layer. Adding a power-law recency bias to these attention heads yielded a model that performs much more similarly to humans. We hope that this scenario will spur future work in bringing LMs closer to human behavior.
翻译:在下一词预测任务上训练的语言模型已被证明能够准确模拟人类在词汇预测和阅读速度方面的行为。然而,与这些发现相反,我们提出了一种人类与语言模型表现存在差异的场景。我们收集了一个人类对五个由重复文本片段构成的刺激材料进行下一词预测的数据集。人类和GPT-2语言模型的预测在文本片段首次呈现时高度一致,但当记忆(或上下文学习)开始发挥作用时,它们的表现迅速分化。我们将这种差异的成因追溯到中间层中特定的注意力头。向这些注意力头添加幂律近因偏差后,所得模型的预测行为与人类更为接近。我们希望这一场景能推动未来研究,使语言模型更贴近人类行为。