Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Conditional Diffusion Models (RCDMs), a two-stage approach designed to enhance story generation's semantic consistency and temporal consistency. Specifically, in the first stage, the frame-prior transformer diffusion model is presented to predict the frame semantic embedding of the unknown clip by aligning the semantic correlations between the captions and frames of the known clip. The second stage establishes a robust model with rich contextual conditions, including reference images of the known clip, the predicted frame semantic embedding of the unknown clip, and text embeddings of all captions. By jointly injecting these rich contextual conditions at the image and feature levels, RCDMs can generate semantic and temporal consistency stories. Moreover, RCDMs can generate consistent stories with a single forward inference compared to autoregressive models. Our qualitative and quantitative results demonstrate that our proposed RCDMs outperform in challenging scenarios. The code and model will be available at https://github.com/muzishen/RCDMs.

翻译：近期研究表明，条件扩散模型在生成一致性故事方面具有巨大潜力。然而，现有方法主要采用自回归且过度依赖字幕描述的方式生成故事序列，往往低估了序列生成过程中帧间上下文一致性与关联性。为此，我们提出一种新颖的丰富上下文条件扩散模型（RCDMs），该两阶段方法旨在增强故事生成的语义一致性与时序一致性。具体而言，在第一阶段，我们提出帧先验Transformer扩散模型，通过对齐已知片段字幕与帧间的语义关联，预测未知片段的帧语义嵌入。第二阶段构建具有丰富上下文条件的鲁棒模型，其条件包括已知片段的参考图像、预测的未知片段帧语义嵌入以及所有字幕的文本嵌入。通过在图像级和特征级联合注入这些丰富的上下文条件，RCDMs能够生成语义与时序一致的故事序列。此外，相较于自回归模型，RCDMs仅需单次前向推理即可生成一致性故事。定性与定量实验结果表明，我们所提出的RCDMs在复杂场景中表现优异。代码与模型将在https://github.com/muzishen/RCDMs发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日