We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed to support the development of fair and transparent NLP models. WebQAmGaze includes webcam eye-tracking data from 332 participants naturally reading English, Spanish, and German texts. Each participant performs two reading tasks composed of five texts, a normal reading and an information-seeking task. After preprocessing the data, we find that fixations on relevant spans seem to indicate correctness when answering the comprehension questions. Additionally, we perform a comparative analysis of the data collected to high-quality eye-tracking data. The results show a moderate correlation between the features obtained with the webcam-ET compared to those of a commercial ET device. We believe this data can advance webcam-based reading studies and open a way to cheaper and more accessible data collection. WebQAmGaze is useful to learn about the cognitive processes behind question answering (QA) and to apply these insights to computational models of language understanding.
翻译:摘要:我们创建了WebQAmGaze,一个多语种低成本眼动追踪阅读数据集,旨在支持公平透明的NLP模型开发。该数据集包含332名参与者在自然阅读英语、西班牙语和德语文本时的网络摄像头眼动数据。每位参与者完成两项阅读任务,每项任务由五篇文本组成:普通阅读任务和基于信息搜索的问答任务。数据预处理后,我们发现对于理解性问题的正确回答,似乎与相关文本片段上的注视行为存在关联。此外,我们将收集到的数据与高质量眼动追踪数据进行了比较分析。结果显示,网络摄像头眼动追踪设备提取的特征与商用眼动追踪设备相比具有中等程度的相关性。我们相信,该数据将推动基于网络摄像头的阅读研究,并为更廉价、更易获取的数据收集方式开辟新途径。WebQAmGaze有助于理解问答任务背后的认知过程,并将这些见解应用于计算语言理解模型。