We present a new perspective on how readers integrate context during real-time language comprehension. Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit (e.g., a word) is an affine function of its in-context information content. We first observe that surprisal is only one out of many potential ways that a contextual predictor can be derived from a language model. Another one is the pointwise mutual information (PMI) between a unit and its context, which turns out to yield the same predictive power as surprisal when controlling for unigram frequency. Moreover, both PMI and surprisal are correlated with frequency. This means that neither PMI nor surprisal contains information about context alone. In response to this, we propose a technique where we project surprisal onto the orthogonal complement of frequency, yielding a new contextual predictor that is uncorrelated with frequency. Our experiments show that the proportion of variance in reading times explained by context is a lot smaller when context is represented by the orthogonalized predictor. From an interpretability standpoint, this indicates that previous studies may have overstated the role that context has in predicting reading times.
翻译:本文提出了一种关于读者在实时语言理解过程中如何整合语境的新视角。我们的研究基于惊奇理论,该理论认为语言单元(如单词)的处理努力是其语境信息内容的仿射函数。我们首先观察到,从语言模型中推导语境预测因子时,惊奇值仅是众多潜在方式之一。另一种方式是单元与其语境之间的逐点互信息(PMI),实验表明在控制单字频率后,PMI与惊奇值具有相同的预测能力。此外,PMI和惊奇值均与频率相关。这意味着PMI和惊奇值都未单独包含纯粹的语境信息。针对此问题,我们提出一种技术方案:将惊奇值投影到频率的正交补空间,从而得到与频率无关的新型语境预测因子。实验表明,当使用正交化预测因子表征语境时,语境对阅读时间方差的解释比例显著降低。从可解释性角度而言,这表明先前研究可能高估了语境在预测阅读时间中的作用。