Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models

Diffusion-based models for story visualization have shown promise in generating content-coherent images for storytelling tasks. However, how to effectively integrate new characters into existing narratives while maintaining character consistency remains an open problem, particularly with limited data. Two major limitations hinder the progress: (1) the absence of a suitable benchmark due to potential character leakage and inconsistent text labeling, and (2) the challenge of distinguishing between new and old characters, leading to ambiguous results. To address these challenges, we introduce the NewEpisode benchmark, comprising refined datasets designed to evaluate generative models' adaptability in generating new stories with fresh characters using just a single example story. The refined dataset involves refined text prompts and eliminates character leakage. Additionally, to mitigate the character confusion of generated results, we propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics. EpicEvo introduces a novel adversarial character alignment module to align the generated images progressively in the diffusive process, with exemplar images of new characters, while applying knowledge distillation to prevent forgetting of characters and background details. Our evaluation quantitatively demonstrates that EpicEvo outperforms existing baselines on the NewEpisode benchmark, and qualitative studies confirm its superior customization of visual story generation in diffusion models. In summary, EpicEvo provides an effective way to incorporate new characters using only one example story, unlocking new possibilities for applications such as serialized cartoons.

翻译：基于扩散的故事可视化模型在生成内容连贯的图像用于叙事任务中展现出潜力。然而，如何在保持角色一致性的同时，将新角色有效融入现有叙事仍是一个开放性问题，尤其是在数据有限的情况下。两大主要限制阻碍了进展：（1）由于潜在的角色泄露和不一致的文本标注，缺乏合适的基准；（2）区分新旧角色的挑战导致结果模糊不清。为解决这些问题，我们引入NewEpisode基准，包含精炼数据集，旨在评估生成模型仅使用一个示例故事生成包含新角色的新故事的适应性。精炼数据集涉及改进的文本提示并消除角色泄露。此外，为缓解生成结果的角色混淆，我们提出EpicEvo方法，该方法通过一个包含新角色的单一故事定制基于扩散的视觉故事生成模型，将其无缝融入已建立的角色动态中。EpicEvo引入新颖的对抗性角色对齐模块，在扩散过程中逐步将生成图像与新角色的示例图像对齐，同时应用知识蒸馏以防止对角色和背景细节的遗忘。我们的评估定量证明，EpicEvo在NewEpisode基准上优于现有基线，定性研究证实了其在扩散模型中对视觉故事生成的卓越定制能力。总之，EpicEvo提供了一种仅用一个示例故事融入新角色的有效方式，为连续漫画等应用解锁了新的可能性。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日