Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

Generative models have recently exhibited exceptional capabilities in various scenarios, for example, image generation based on text description. In this work, we focus on the task of generating a series of coherent image sequence based on a given storyline, denoted as open-ended visual storytelling. We make the following three contributions: (i) to fulfill the task of visual storytelling, we introduce two modules into a pre-trained stable diffusion model, and construct an auto-regressive image generator, termed as StoryGen, that enables to generate the current frame by conditioning on both a text prompt and a preceding frame; (ii) to train our proposed model, we collect paired image and text samples by sourcing from various online sources, such as videos, E-books, and establish a data processing pipeline for constructing a diverse dataset, named StorySalon, with a far larger vocabulary than existing animation-specific datasets; (iii) we adopt a three-stage curriculum training strategy, that enables style transfer, visual context conditioning, and human feedback alignment, respectively. Quantitative experiments and human evaluation have validated the superiority of our proposed model, in terms of image quality, style consistency, content consistency, and visual-language alignment. We will make the code, model, and dataset publicly available to the research community.

翻译：生成模型近年来在多种场景下展现出卓越能力，例如基于文本描述生成图像。本研究聚焦于根据给定故事情节生成连贯图像序列的任务，即开放式视觉叙事。我们做出以下三点贡献：(i) 为完成视觉叙事任务，我们在预训练的稳定扩散模型中引入两个模块，构建自回归图像生成器（命名为StoryGen），通过同时依赖文本提示与前一帧图像生成当前帧；(ii) 为训练所提模型，我们从在线数据源（如视频、电子书）中采集配对图像与文本样本，构建数据处理流水线，形成多样化数据集StorySalon，其词汇量远超现有动画专属数据集；(iii) 我们采用三阶段课程训练策略，分别实现风格迁移、视觉语境条件化与人类反馈对齐。定量实验与人工评估验证了所提模型在图像质量、风格一致性、内容一致性以及视觉语言对齐方面的优越性。我们将向研究社区公开代码、模型与数据集。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/