Translating Text Synopses to Video Storyboards

A storyboard is a roadmap for video creation which consists of shot-by-shot images to visualize key plots in a text synopsis. Creating video storyboards however remains challenging which not only requires association between high-level texts and images, but also demands for long-term reasoning to make transitions smooth across shots. In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images to visualize the text synopsis. We construct a MovieNet-TeViS benchmark based on the public MovieNet dataset. It contains 10K text synopses each paired with keyframes that are manually selected from corresponding movies by considering both relevance and cinematic coherence. We also present an encoder-decoder baseline for the task. The model uses a pretrained vision-and-language model to improve high-level text-image matching. To improve coherence in long-term shots, we further propose to pre-train the decoder on large-scale movie frames without text. Experimental results demonstrate that our proposed model significantly outperforms other models to create text-relevant and coherent storyboards. Nevertheless, there is still a large gap compared to human performance suggesting room for promising future work.

翻译：故事板是视频制作的路线图，由逐镜头图像组成，以可视化文本摘要中的关键情节。然而，创建视频故事板仍然具有挑战性，这不仅需要高级文本与图像之间的关联，还需要长期推理以确保镜头之间的过渡流畅。在本文中，我们提出了一项名为“文本摘要到视频故事板”（TeViS）的新任务，旨在检索有序的图像序列以可视化文本摘要。我们基于公开的MovieNet数据集构建了MovieNet-TeViS基准，包含10K个文本摘要，每个摘要都配有从相应电影中手动选择的关键帧，这些关键帧同时考虑了相关性和电影连贯性。我们还为该任务提出了一个编码器-解码器基线模型。该模型使用预训练的视觉-语言模型来改进高级文本-图像匹配。为了提升长期镜头的连贯性，我们进一步提出在大规模无文本的电影帧上预训练解码器。实验结果表明，我们提出的模型在创建与文本相关且连贯的故事板方面显著优于其他模型。尽管如此，与人类表现相比仍存在较大差距，这为未来有前景的研究留下了空间。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日