This work presents FlashFace, a practical tool with which users can easily personalize their own photos on the fly by providing one or a few reference face images and a text prompt. Our approach is distinguishable from existing human photo customization methods by higher-fidelity identity preservation and better instruction following, benefiting from two subtle designs. First, we encode the face identity into a series of feature maps instead of one image token as in prior arts, allowing the model to retain more details of the reference faces (e.g., scars, tattoos, and face shape ). Second, we introduce a disentangled integration strategy to balance the text and image guidance during the text-to-image generation process, alleviating the conflict between the reference faces and the text prompts (e.g., personalizing an adult into a "child" or an "elder"). Extensive experimental results demonstrate the effectiveness of our method on various applications, including human image personalization, face swapping under language prompts, making virtual characters into real people, etc. Project Page: https://jshilong.github.io/flashface-page.
翻译:本文提出FlashFace,一种实用工具,用户可通过提供一张或多张参考人脸图像及文本提示,轻松实时个性化自身照片。得益于两项精妙设计,我们的方法在更高保真度的身份保持与更优指令跟随方面,区别于现有人像定制方法。首先,我们将人脸身份编码为一系列特征图而非如现有技术中的单一图像标记,使模型能保留参考人脸的更多细节(如疤痕、纹身与脸型)。其次,我们提出解耦融合策略,以在文本到图像生成过程中平衡文本与图像引导,缓解参考人脸与文本提示间的冲突(例如将成人个性化处理为“儿童”或“长者”)。大量实验结果证明了我们方法在多种应用中的有效性,包括人像个性化、语言提示下的人脸替换、将虚拟角色转化为真实人物等。项目页面:https://jshilong.github.io/flashface-page。