Eye movements during reading offer insights into both the reader's cognitive processes and the characteristics of the text that is being read. Hence, the analysis of scanpaths in reading have attracted increasing attention across fields, ranging from cognitive science over linguistics to computer science. In particular, eye-tracking-while-reading data has been argued to bear the potential to make machine-learning-based language models exhibit a more human-like linguistic behavior. However, one of the main challenges in modeling human scanpaths in reading is their dual-sequence nature: the words are ordered following the grammatical rules of the language, whereas the fixations are chronologically ordered. As humans do not strictly read from left-to-right, but rather skip or refixate words and regress to previous words, the alignment of the linguistic and the temporal sequence is non-trivial. In this paper, we develop Eyettention, the first dual-sequence model that simultaneously processes the sequence of words and the chronological sequence of fixations. The alignment of the two sequences is achieved by a cross-sequence attention mechanism. We show that Eyettention outperforms state-of-the-art models in predicting scanpaths. We provide an extensive within- and across-data set evaluation on different languages. An ablation study and qualitative analysis support an in-depth understanding of the model's behavior.
翻译:阅读中的眼动不仅反映了读者的认知过程,也揭示了所读文本的特征。因此,扫视路径分析已引起认知科学、语言学及计算机科学等领域的广泛关注。特别地,有观点认为,阅读过程中的眼动追踪数据具有使基于机器学习的语言模型表现出更接近人类语言行为的潜力。然而,模拟人类阅读扫视路径的主要挑战之一在于其双序列特性:词汇遵循语言的语法规则线性排列,而注视点则按时间顺序排列。由于人类并非严格从左至右阅读,而是会跳过或重读词汇,甚至回视前文,因此语言序列与时间序列的对齐并非易事。本文提出Eyettention——首个能同时处理词汇序列与注视点时间序列的双序列模型。通过交叉序列注意力机制实现两个序列的对齐。实验表明,Eyettention在扫视路径预测任务上优于现有最优模型。我们针对不同语种开展了数据集内及跨数据集的广泛评估,并通过消融实验与定性分析深入探究了模型的行为机制。