WebQAmGaze: A Multilingual Webcam Eye-Tracking-While-Reading Dataset

We present WebQAmGaze, a multilingual low-cost eye-tracking-while-reading dataset, designed as the first webcam-based eye-tracking corpus of reading to support the development of explainable computational language processing models. WebQAmGaze includes webcam eye-tracking data from 600 participants of a wide age range naturally reading English, German, Spanish, and Turkish texts. Each participant performs two reading tasks composed of five texts each, a normal reading and an information-seeking task, followed by a comprehension question. We compare the collected webcam data to high-quality eye-tracking recordings. The results show a moderate to strong correlation between the eye movement measures obtained with the webcam compared to those obtained with a commercial eye-tracking device. When validating the data, we find that higher fixation duration on relevant text spans accurately indicates correctness when answering the corresponding questions. This dataset advances webcam-based reading studies and opens avenues to low-cost and diverse data collection. WebQAmGaze is beneficial to learn about the cognitive processes behind question-answering and to apply these insights to computational models of language understanding.

翻译：我们提出WebQAmGaze，一个多语言低成本阅读眼动追踪数据集，旨在作为首个基于网络摄像头的阅读眼动语料库，以支持可解释计算语言处理模型的发展。WebQAmGaze包含来自600名年龄跨度广泛的参与者的网络摄像头眼动数据，这些参与者自然阅读英语、德语、西班牙语和土耳其语文本。每位参与者完成两项阅读任务，每项任务包含五篇文本，分别是常规阅读和信息检索任务，随后回答理解性问题。我们将收集的网络摄像头数据与高质量眼动追踪记录进行比较。结果显示，使用网络摄像头获取的眼动指标与使用商业眼动追踪设备获取的指标之间存在中等到强的相关性。在验证数据时，我们发现相关文本片段上更高的注视持续时间能准确指示回答相应问题时的正确性。该数据集推动了基于网络摄像头的阅读研究，并为低成本、多样化的数据收集开辟了途径。WebQAmGaze有助于了解问答背后的认知过程，并将这些见解应用于语言理解的计算模型。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日