Eye-Tracking-while-Reading: A Living Survey of Datasets with Open Library Support

Eye-tracking-while-reading corpora are a valuable resource for many different disciplines and use cases. Use cases range from studying the cognitive processes underlying reading to machine-learning-based applications, such as gaze-based assessments of reading comprehension. The past decades have seen an increase in the number and size of eye-tracking-while-reading datasets as well as increasing diversity with regard to the stimulus languages covered, the linguistic background of the participants, or accompanying psychometric or demographic data. The spread of data across different disciplines and the lack of data sharing standards across the communities lead to many existing datasets that cannot be easily reused due to a lack of interoperability. In this work, we aim at creating more transparency and clarity with regards to existing datasets and their features across different disciplines by i) presenting an extensive overview of existing datasets, ii) simplifying the sharing of newly created datasets by publishing a living overview online, https://dili-lab.github.io/datasets.html, presenting over 45 features for each dataset, and iii) integrating all publicly available datasets into the Python package pymovements which offers an eye-tracking datasets library. By doing so, we aim to strengthen the FAIR principles in eye-tracking-while-reading research and promote good scientific practices, such as reproducing and replicating studies.

翻译：阅读时眼动追踪语料库是众多不同学科和应用场景的宝贵资源。其应用范围涵盖从研究阅读背后的认知过程到基于机器学习的应用（例如基于注视的阅读理解评估）。过去数十年间，阅读时眼动追踪数据集的数量与规模持续增长，且在刺激语言覆盖范围、参与者语言背景、伴随的心理测量或人口统计学数据等方面呈现出日益增长的多样性。数据分散于不同学科领域及各社区间缺乏数据共享标准，导致许多现有数据集因互操作性不足而难以被重复使用。本研究旨在通过以下途径提升跨学科现有数据集及其特征的透明度与清晰度：i) 呈现现有数据集的全面概览；ii) 通过发布在线动态概览（https://dili-lab.github.io/datasets.html）简化新建数据集的共享流程，该概览为每个数据集呈现超过45项特征；iii) 将所有公开可用数据集整合至提供眼动追踪数据集库的Python软件包pymovements中。藉此，我们致力于强化阅读时眼动追踪研究中的FAIR原则，并推动重复与复制研究等良好科学实践。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

跨多种数据模态的视觉目标跟踪：综述

专知会员服务

30+阅读 · 2024年12月16日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

38+阅读 · 2022年3月25日