GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension

There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the GUIDE (Guideline-Guided) dataset, which contains 3.5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step descriptions and timestamps. Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos. (2) Guideline Summarization: models have to mine the common pattern in task-related videos and summarize a guideline from them. (3) Guideline-Guided Captioning: models have to generate captions for specific steps under the guide of guideline. We evaluate plenty of foundation models with GUIDE and perform in-depth analysis. Given the diversity and practicality of GUIDE, we believe that it can be used as a better benchmark for instructional video comprehension.

翻译：互联网上存在大量教学视频，为我们完成各类任务提供了教程。现有的教学视频数据集仅关注视频层面的具体步骤，缺乏任务层面的经验性指导原则，这可能导致初学者因缺乏相关经验而难以学习新任务。此外，没有指导原则的具体步骤零散且不成体系，难以提供清晰的教程。为解决这些问题，我们提出了GUIDE（Guideline-Guided）数据集，该数据集包含8个日常生活相关领域的560项教学任务，共计3.5K个视频。具体而言，我们为每项教学任务标注了指导原则，该原则代表了所有相关视频共享的通用模式。在此基础上，我们标注了系统化的具体步骤，包括其关联的指导原则步骤、具体步骤描述及时间戳。我们提出的基准测试包含三个子任务，用于评估模型的理解能力：（1）步骤描述：模型需根据视频生成具体步骤的文字描述。（2）指导原则总结：模型需从任务相关视频中挖掘通用模式并总结出指导原则。（3）指导原则引导的描述生成：模型需在指导原则的引导下生成具体步骤的描述。我们使用GUIDE评估了多种基础模型并进行了深入分析。鉴于GUIDE的多样性与实用性，我们认为其可作为教学视频理解领域更优的基准测试数据集。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日