Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.
翻译:近期研究发现,Transformer等当代语言模型在下一个词预测任务上表现过于优异,反而导致其计算出的概率在预测阅读时间时效果变差。本文提出,这一现象可解释为阅读时间对简单的n-gram统计量更为敏感,而非对最先进的Transformer语言模型所学习的复杂统计量敏感。我们证明,其预测结果与n-gram概率最相关的神经语言模型,所计算出的概率与基于眼动追踪的自然文本阅读时间指标也最具相关性。