Recent advancement in personalized image generation have unveiled the intriguing capability of pre-trained text-to-image models on learning identity information from a collection of portrait images. However, existing solutions can be vulnerable in producing truthful details, and usually suffer from several defects such as (i) The generated face exhibit its own unique characteristics, \ie facial shape and facial feature positioning may not resemble key characteristics of the input, and (ii) The synthesized face may contain warped, blurred or corrupted regions. In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input. Concretely, we inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions, such as DreamBooth ~\cite{ruiz2023dreambooth} , InstantBooth ~\cite{shi2023instantbooth} , or other LoRA-only approaches ~\cite{hu2021lora} . Through the development of FaceChain, we have identified several potential directions to accelerate development of Face/Human-Centric AIGC research and application. We have designed FaceChain as a framework comprised of pluggable components that can be easily adjusted to accommodate different styles and personalized needs. We hope it can grow to serve the burgeoning needs from the communities. FaceChain is open-sourced under Apache-2.0 license at \url{https://github.com/modelscope/facechain}.
翻译:近年来个性化图像生成技术的进步揭示了预训练文本到图像模型在学习肖像图像集合中身份信息方面的惊人能力。然而现有解决方案在生成真实细节时存在脆弱性,常出现以下缺陷:(i)生成的人脸具有独有特征,即面部形状和面部特征定位可能与输入的关键特征不相似;(ii)合成的人脸可能出现扭曲、模糊或损坏区域。本文提出FaceChain——一种结合定制化图像生成模型与丰富面部感知理解模型(如人脸检测、深度人脸嵌入提取及面部属性识别)的个性化肖像生成框架,以解决上述挑战并实现仅需少量肖像图像输入即可生成真实的个性化肖像。具体而言,我们将多个SOTA人脸模型注入生成流程,相较于DreamBooth、InstantBooth或其他仅使用LoRA的方法等先前方案,实现了更高效的标签标注、数据处理和模型后处理。通过FaceChain的开发,我们识别出若干加速以人脸/人为中心的AIGC研究与应用的潜在方向。我们设计FaceChain为可插拔组件框架,可灵活调整以适配不同风格与个性化需求,期望其能服务日益增长的社区需求。FaceChain已在Apache-2.0许可下开源(\url{https://github.com/modelscope/facechain})。