To date, most investigations on surprisal and entropy effects in reading have been conducted on the group level, disregarding individual differences. In this work, we revisit the predictive power of surprisal and entropy measures estimated from a range of language models (LMs) on data of human reading times as a measure of processing effort by incorporating information of language users' cognitive capacities. To do so, we assess the predictive power of surprisal and entropy estimated from generative LMs on reading data obtained from individuals who also completed a wide range of psychometric tests. Specifically, we investigate if modulating surprisal and entropy relative to cognitive scores increases prediction accuracy of reading times, and we examine whether LMs exhibit systematic biases in the prediction of reading times for cognitively high- or low-performing groups, revealing what type of psycholinguistic subject a given LM emulates. Our study finds that in most cases, incorporating cognitive capacities increases predictive power of surprisal and entropy on reading times, and that generally, high performance in the psychometric tests is associated with lower sensitivity to predictability effects. Finally, our results suggest that the analyzed LMs emulate readers with lower verbal intelligence, suggesting that for a given target group (i.e., individuals with high verbal intelligence), these LMs provide less accurate predictability estimates.
翻译:迄今为止,关于阅读中惊奇度与熵效应的研究大多在群体层面展开,忽略了个体差异。本研究通过整合语言使用者认知能力信息,重新评估了从一系列语言模型(LMs)估计的惊奇度与熵度量对人类阅读时间(作为加工努力指标)的预测能力。为此,我们基于完成广泛心理测量测试的个体阅读数据,评估了生成式语言模型所估计的惊奇度与熵的预测效力。具体而言,我们探究了根据认知分数调整惊奇度与熵是否会提升阅读时间的预测准确度,并检验语言模型在预测高认知表现组与低认知表现组的阅读时间时是否呈现系统性偏差,从而揭示特定语言模型所模拟的心理语言学主体类型。研究发现,在多数情况下,纳入认知能力可增强惊奇度与熵对阅读时间的预测力;且总体而言,心理测试中的高表现与对可预测性效应的较低敏感性相关。最后,研究结果表明所分析的语言模型模拟了言语智力较低的阅读者,这意味着对于特定目标群体(即高言语智力个体),这些语言模型提供的可预测性估计精度较低。