DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{https://github.com/hint-lab/doctrack}.

翻译：视觉丰富文档（VRDs）在多个领域的广泛应用，催生了能够像人类一样阅读和理解文档的文档人工智能（Document AI）模型的发展需求，这需要克服技术、语言和认知层面的障碍。然而，合适数据集的匮乏严重阻碍了该领域的进步。为解决这一问题，我们提出了\textsc{DocTrack}——一个利用眼动追踪技术、真正与人类眼动信息对齐的VRD数据集。该数据集可用于探究上述挑战。此外，我们探讨了人类阅读顺序对文档理解任务的影响，并考察了机器以与人类相同顺序阅读时的表现。结果表明，尽管文档AI模型取得了显著进展，但在准确、连续和灵活地阅读VRDs方面，它们与人类仍有很大差距。这些发现对文档AI模型的未来研究与发展具有潜在启示意义。数据获取地址：\url{https://github.com/hint-lab/doctrack}。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日