A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, i.e. its negative log-probability given a context. While evidence supporting the predictions of Surprisal Theory have been replicated widely, most have focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times; (ii) whether expected surprisal, i.e. contextual entropy, is predictive of reading times; (iii) and whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.
翻译:心理语言学的一项基本结论是:可预测性较低的词汇需要更长的处理时间。这一发现的理论解释之一即惊奇理论(Hale, 2001; Levy, 2008),该理论将词汇的可预测性量化为其惊奇度,即在给定语境下该词汇的负对数概率。尽管支持惊奇理论预测的证据已被广泛重复验证,但多数研究仅聚焦于极为狭窄的数据范围:以英语为母语者阅读英语文本。实际上,目前尚不存在全面的多语言分析。我们通过研究11种分属5个语系的语言中惊奇度与阅读时间之间的关系,填补了当前文献中的这一空白。基于从单语和多语语料库中训练的语言模型获取的估计值,我们测试了惊奇理论的三个预测:(i) 惊奇度是否能够预测阅读时间;(ii) 期望惊奇度(即语境熵)是否能够预测阅读时间;(iii) 惊奇度与阅读时间之间的连接函数是否为线性。研究发现,这三个预测在跨语言层面均得到证实。通过聚焦更具多样性的语言集合,我们论证这些结果为信息论与跨语言渐进语言处理之间建立了迄今最可靠的关联。