Diffusion-based models for story visualization have shown promise in generating content-coherent images for storytelling tasks. However, how to effectively integrate new characters into existing narratives while maintaining character consistency remains an open problem, particularly with limited data. Two major limitations hinder the progress: (1) the absence of a suitable benchmark due to potential character leakage and inconsistent text labeling, and (2) the challenge of distinguishing between new and old characters, leading to ambiguous results. To address these challenges, we introduce the NewEpisode benchmark, comprising refined datasets designed to evaluate generative models' adaptability in generating new stories with fresh characters using just a single example story. The refined dataset involves refined text prompts and eliminates character leakage. Additionally, to mitigate the character confusion of generated results, we propose EpicEvo, a method that customizes a diffusion-based visual story generation model with a single story featuring the new characters seamlessly integrating them into established character dynamics. EpicEvo introduces a novel adversarial character alignment module to align the generated images progressively in the diffusive process, with exemplar images of new characters, while applying knowledge distillation to prevent forgetting of characters and background details. Our evaluation quantitatively demonstrates that EpicEvo outperforms existing baselines on the NewEpisode benchmark, and qualitative studies confirm its superior customization of visual story generation in diffusion models. In summary, EpicEvo provides an effective way to incorporate new characters using only one example story, unlocking new possibilities for applications such as serialized cartoons.
翻译:基于扩散的故事可视化模型在生成内容连贯的图像用于叙事任务中展现出潜力。然而,如何在保持角色一致性的同时,将新角色有效融入现有叙事仍是一个开放性问题,尤其是在数据有限的情况下。两大主要限制阻碍了进展:(1)由于潜在的角色泄露和不一致的文本标注,缺乏合适的基准;(2)区分新旧角色的挑战导致结果模糊不清。为解决这些问题,我们引入NewEpisode基准,包含精炼数据集,旨在评估生成模型仅使用一个示例故事生成包含新角色的新故事的适应性。精炼数据集涉及改进的文本提示并消除角色泄露。此外,为缓解生成结果的角色混淆,我们提出EpicEvo方法,该方法通过一个包含新角色的单一故事定制基于扩散的视觉故事生成模型,将其无缝融入已建立的角色动态中。EpicEvo引入新颖的对抗性角色对齐模块,在扩散过程中逐步将生成图像与新角色的示例图像对齐,同时应用知识蒸馏以防止对角色和背景细节的遗忘。我们的评估定量证明,EpicEvo在NewEpisode基准上优于现有基线,定性研究证实了其在扩散模型中对视觉故事生成的卓越定制能力。总之,EpicEvo提供了一种仅用一个示例故事融入新角色的有效方式,为连续漫画等应用解锁了新的可能性。