Eye movements during reading offer insights into both the reader's cognitive processes and the characteristics of the text that is being read. Hence, the analysis of scanpaths in reading have attracted increasing attention across fields, ranging from cognitive science over linguistics to computer science. In particular, eye-tracking-while-reading data has been argued to bear the potential to make machine-learning-based language models exhibit a more human-like linguistic behavior. However, one of the main challenges in modeling human scanpaths in reading is their dual-sequence nature: the words are ordered following the grammatical rules of the language, whereas the fixations are chronologically ordered. As humans do not strictly read from left-to-right, but rather skip or refixate words and regress to previous words, the alignment of the linguistic and the temporal sequence is non-trivial. In this paper, we develop Eyettention, the first dual-sequence model that simultaneously processes the sequence of words and the chronological sequence of fixations. The alignment of the two sequences is achieved by a cross-sequence attention mechanism. We show that Eyettention outperforms state-of-the-art models in predicting scanpaths. We provide an extensive within- and across-data set evaluation on different languages. An ablation study and qualitative analysis support an in-depth understanding of the model's behavior.
翻译:阅读过程中的眼动既能揭示读者的认知过程,也能反映所读文本的特征。因此,阅读中扫描路径的分析已日益引起从认知科学、语言学至计算机科学等多个领域的关注。特别地,有观点认为"阅读时眼动追踪"数据具有潜力,能使基于机器学习的语言模型展现出更类人的语言行为。然而,模拟人类阅读中扫描路径的主要挑战之一在于其双序列特性:词语按语言的语法规则排序,而注视点则按时间顺序排列。由于人类并非严格从左至右阅读,而是会跳过或重注视词语,并回归至前文词语,语言序列与时间序列的对齐并非易事。本文中,我们开发了Eyettention——首个同时处理词语序列与注视点时间序列的双序列模型。这两个序列的对齐通过跨序列注意力机制实现。我们证明,Eyettention在预测扫描路径方面优于当前最优模型。我们针对不同语言开展了广泛的数据集内与跨数据集评估。消融研究及定性分析支持对模型行为的深入理解。