Movie101v2: Improved Movie Narration Benchmark

Automatic movie narration targets at creating video-aligned plot descriptions to assist visually impaired audiences. It differs from standard video captioning in that it requires not only describing key visual details but also inferring the plots developed across multiple movie shots, thus posing unique and ongoing challenges. To advance the development of automatic movie narrating systems, we first revisit the limitations of existing datasets and develop a large-scale, bilingual movie narration dataset, Movie101v2. Second, taking into account the essential difficulties in achieving applicable movie narration, we break the long-term goal into three progressive stages and tentatively focus on the initial stages featuring understanding within individual clips. We also introduce a new narration assessment to align with our staged task goals. Third, using our new dataset, we baseline several leading large vision-language models, including GPT-4V, and conduct in-depth investigations into the challenges current models face for movie narration generation. Our findings reveal that achieving applicable movie narration generation is a fascinating goal that requires thorough research.

翻译：自动电影旁白旨在生成与视频对齐的情节描述，以辅助视障观众。与标准视频字幕不同，它不仅需要描述关键视觉细节，还需推断跨多个电影镜头展开的情节，因此构成了独特且持续的挑战。为推动自动电影旁白系统的发展，我们首先审视了现有数据集的局限性，并构建了一个大规模的双语电影旁白数据集Movie101v2。其次，考虑到实现实用电影旁白的关键难点，我们将长期目标分解为三个渐进阶段，并初步聚焦于理解单个片段内的初始阶段。我们还引入了一种新的旁白评估标准，以契合阶段性任务目标。第三，利用新数据集，我们对多个领先的大规模视觉语言模型（包括GPT-4V）进行了基线测试，并深入探究了当前模型在电影旁白生成中面临的挑战。研究结果表明，实现实用的电影旁白生成是一个需要深入研究的迷人目标。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日