ScreenQA：基于移动应用截图的大规模问答对数据集 (ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots)

We introduce ScreenQA, a novel benchmarking dataset designed to advance screen content understanding through question answering. The existing screen datasets are focused either on low-level structural and component understanding, or on a much higher-level composite task such as navigation and task completion for autonomous agents. ScreenQA attempts to bridge this gap. By annotating 86k question-answer pairs over the RICO dataset, we aim to benchmark the screen reading comprehension capacity, thereby laying the foundation for vision-based automation over screenshots. Our annotations encompass full answers, short answer phrases, and corresponding UI contents with bounding boxes, enabling four subtasks to address various application scenarios. We evaluate the dataset's efficacy using both open-weight and proprietary models in zero-shot, fine-tuned, and transfer learning settings. We further demonstrate positive transfer to web applications, highlighting its potential beyond mobile applications.

翻译：本文介绍ScreenQA，一个旨在通过问答任务推进屏幕内容理解的新型基准数据集。现有屏幕数据集主要关注低层次的结构与组件理解，或更高层次的复合任务（如自主智能体的导航与任务完成）。ScreenQA试图填补这一空白。通过对RICO数据集标注8.6万个问答对，我们旨在建立屏幕阅读理解能力的评估基准，从而为基于视觉的截图自动化奠定基础。我们的标注包含完整答案、简短答案短语及带边界框的对应UI内容，支持四个子任务以应对不同应用场景。我们使用开源模型与专有模型在零样本、微调和迁移学习设置下评估数据集的有效性，并进一步展示了向网页应用的积极迁移效果，凸显了其在移动应用之外的潜力。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日