SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video ("story-level"), it is desirable to have creative transition and prediction effects across different clips. This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos. Specifically, we propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. By providing the images of different scenes as inputs, combined with text-based control, our model generates transition videos that ensure coherence and visual quality. Furthermore, the model can be readily extended to various tasks such as image-to-video animation and autoregressive video prediction. To conduct a comprehensive evaluation of this new generative task, we propose three assessing criteria for smooth and creative transition: temporal consistency, semantic similarity, and video-text semantic alignment. Extensive experiments validate the effectiveness of our approach over existing methods for generative transition and prediction, enabling the creation of story-level long videos. Project page: https://vchitect.github.io/SEINE-project/ .

翻译：近期视频生成在逼真性方面取得了显著进展，但现有AI生成的视频通常为描绘单一场景的短片段（"镜头级"）。为了实现连贯的长视频（"故事级"），需要在不同片段间实现具有创造性的过渡与预测效果。本文提出一种短到长视频扩散模型SEINE，专注于生成式过渡与预测。其目标是在场间及可变长度镜头级视频间生成具有平滑且创造性过渡的高质量长视频。具体而言，我们提出一种随机掩码视频扩散模型，可基于文本描述自动生成过渡片段。通过将不同场景图像作为输入，结合文本控制，该模型可生成确保连贯性与视觉质量的过渡视频。此外，该模型可便捷扩展至图像到视频动画、自回归视频预测等多种任务。为系统评估这一新型生成任务，我们提出三项评估标准用于衡量平滑且创造性的过渡：时间一致性、语义相似性及视频-文本语义对齐。大量实验验证了本方法在生成式过渡与预测任务中相较现有方法的有效性，从而支持了故事级长视频的创作。项目页面：https://vchitect.github.io/SEINE-project/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日