Personalized text-to-image generation has emerged as a powerful and sought-after tool, empowering users to create customized images based on their specific concepts and prompts. However, existing approaches to personalization encounter multiple challenges, including long tuning times, large storage requirements, the necessity for multiple input images per identity, and limitations in preserving identity and editability. To address these obstacles, we present PhotoVerse, an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains, providing effective control over the image generation process. Furthermore, we introduce facial identity loss as a novel component to enhance the preservation of identity during training. Remarkably, our proposed PhotoVerse eliminates the need for test time tuning and relies solely on a single facial photo of the target identity, significantly reducing the resource cost associated with image generation. After a single training phase, our approach enables generating high-quality images within only a few seconds. Moreover, our method can produce diverse images that encompass various scenes and styles. The extensive evaluation demonstrates the superior performance of our approach, which achieves the dual objectives of preserving identity and facilitating editability. Project page: https://photoverse2d.github.io/
翻译:个性化文本到图像生成已成为一种强大且广受欢迎的工具,使用户能够根据特定概念和提示创建定制化图像。然而,现有的个性化方法面临多重挑战,包括调优时间长、存储需求大、每个身份需要多张输入图像,以及在保持身份特征和可编辑性方面的局限性。为解决这些障碍,我们提出PhotoVerse,一种创新方法,在文本和图像域引入双分支条件控制机制,从而有效控制图像生成过程。此外,我们引入面部身份损失作为新组件,以在训练过程中增强身份特征的保持。值得注意的是,所提出的PhotoVoice无需测试时调优,且仅依赖目标身份的单张面部照片,显著降低了图像生成的资源成本。在单次训练阶段后,我们的方法能在数秒内生成高质量图像。同时,该方法可生成涵盖多种场景和风格的多样化图像。广泛评估表明,我们的方法在实现身份保持与可编辑性双重目标方面性能优越。项目页面:https://photoverse2d.github.io/