We create WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed to support the development of fair and transparent NLP models. WebQAmGaze includes webcam eye-tracking data from 332 participants naturally reading English, Spanish, and German texts. Each participant performs two reading tasks composed of five texts, a normal reading and an information-seeking task. After preprocessing the data, we find that fixations on relevant spans seem to indicate correctness when answering the comprehension questions. Additionally, we perform a comparative analysis of the data collected to high-quality eye-tracking data. The results show a moderate correlation between the features obtained with the webcam-ET compared to those of a commercial ET device. We believe this data can advance webcam-based reading studies and open a way to cheaper and more accessible data collection. WebQAmGaze is useful to learn about the cognitive processes behind question answering (QA) and to apply these insights to computational models of language understanding.
翻译:我们构建了WebQAmGaze,一个多语种低成本阅读眼动追踪数据集,旨在支持公平且透明的自然语言处理模型开发。该数据集包含332名参与者在自然阅读英语、西班牙语及德语文本时的网络摄像头眼动数据。每位参与者完成两项阅读任务(每项任务包含五篇文本):常规阅读任务与信息检索任务。数据预处理后,我们发现,在回答理解性问题时,对相关文本片段的注视时长似乎能反映回答的正确性。此外,我们将收集的数据与高质量眼动追踪数据进行对比分析。结果表明,与商用眼动追踪设备相比,网络摄像头眼动追踪技术获取的特征之间存在中等程度的相关性。我们相信,该数据能够推动基于网络摄像头的阅读研究,并开辟更经济、更便捷的数据采集途径。WebQAmGaze可用于探索问答(QA)背后的认知过程,并将这些见解应用于语言理解的计算模型。