Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and diverse human face consistency and coordinating multiple characters across different images. This paper presents IdentityStory, a framework for human-centric story generation that ensures consistent character identity across multiple sequential images. By taming identity-preserving generators, the framework features two key components: Iterative Identity Discovery, which extracts cohesive character identities, and Re-denoising Identity Injection, which re-denoises images to inject identities while preserving desired context. Experiments on the ConsiStory-Human benchmark demonstrate that IdentityStory outperforms existing methods, particularly in face consistency, and supports multi-character combinations. The framework also shows strong potential for applications such as infinite-length story generation and dynamic character composition.
翻译:最近的视觉生成模型能够从文本生成具有一致角色的故事,但以人为中心的故事生成面临额外挑战,例如保持详细且多样化的人脸一致性,以及在不同图像中协调多个角色。本文提出IdentityStory,一个用于以人为中心的故事生成框架,确保在多个连续图像中保持角色身份的一致性。通过驯服身份保持生成器,该框架包含两个关键组件:迭代身份发现,用于提取连贯的角色身份;以及重去噪身份注入,通过重去噪图像注入身份,同时保留所需上下文。在ConsiStory-Human基准上的实验表明,IdentityStory优于现有方法,特别是在人脸一致性方面,并支持多角色组合。该框架还显示出在无限长度故事生成和动态角色组合等应用中的强大潜力。