When humans read a text, their eye movements are influenced by the structural complexity of the input sentences. This cognitive phenomenon holds across languages and recent studies indicate that multilingual language models utilize structural similarities between languages to facilitate cross-lingual transfer. We use sentence-level eye-tracking patterns as a cognitive indicator for structural complexity and show that the multilingual model XLM-RoBERTa can successfully predict varied patterns for 13 typologically diverse languages, despite being fine-tuned only on English data. We quantify the sensitivity of the model to structural complexity and distinguish a range of complexity characteristics. Our results indicate that the model develops a meaningful bias towards sentence length but also integrates cross-lingual differences. We conduct a control experiment with randomized word order and find that the model seems to additionally capture more complex structural information.
翻译:当人类阅读文本时,其眼动过程受到输入语句结构复杂性的影响。这一认知现象在不同语言中普遍存在,近期研究表明多语言语言模型能够利用语言间的结构相似性促进跨语言迁移。我们以语句级眼动模式作为结构复杂度的认知指标,发现多语言模型XLM-RoBERTa虽仅在英语数据上微调,却能成功预测13种类型差异显著语言的多样化眼动模式。我们量化了模型对结构复杂度的敏感度,并区分了多种复杂度特征。研究结果表明,模型在发展出对句子长度的显著偏向的同时,也整合了跨语言差异。通过随机词序对照实验,我们发现模型似乎还能进一步捕获更复杂的结构信息。