We present a new task and dataset, ScreenQA, for screen content understanding via question answering. The existing screen datasets are focused either on structure and component-level understanding, or on a much higher-level composite task such as navigation and task completion. We attempt to bridge the gap between these two by annotating 86K question-answer pairs over the RICO dataset in hope to benchmark the screen reading comprehension capacity.
翻译:我们提出了一个新颖的任务与数据集ScreenQA,旨在通过问答方式进行屏幕内容理解。现有屏幕数据集要么聚焦于结构与组件级理解,要么关注更高级别的复合任务(如导航与任务完成)。我们尝试弥合这两类研究之间的差距,基于RICO数据集标注了8.6万个问答对,以期建立屏幕阅读理解能力的基准测试标准。