MagiCapture: High-Resolution Multi-Concept Portrait Customization

Large-scale text-to-image models including Stable Diffusion are capable of generating high-fidelity photorealistic portrait images. There is an active research area dedicated to personalizing these models, aiming to synthesize specific subjects or styles using provided sets of reference images. However, despite the plausible results from these personalization methods, they tend to produce images that often fall short of realism and are not yet on a commercially viable level. This is particularly noticeable in portrait image generation, where any unnatural artifact in human faces is easily discernible due to our inherent human bias. To address this, we introduce MagiCapture, a personalization method for integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references. For instance, given a handful of random selfies, our fine-tuned model can generate high-quality portrait images in specific styles, such as passport or profile photos. The main challenge with this task is the absence of ground truth for the composed concepts, leading to a reduction in the quality of the final output and an identity shift of the source subject. To address these issues, we present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting. Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs. MagiCapture outperforms other baselines in both quantitative and qualitative evaluations and can also be generalized to other non-human objects.

翻译：大规模文本到图像模型（包括Stable Diffusion）能够生成高保真度的逼真肖像图像。目前活跃的研究领域致力于个性化这些模型，旨在利用提供的参考图像集合成特定主题或风格。然而，尽管这些个性化方法取得了看似合理的结果，它们生成的图像往往缺乏真实感，尚未达到商业可用水平。这一问题在肖像图像生成中尤为突出，由于人类固有的认知偏差，人脸中的任何非自然伪影都容易被察觉。为解决此问题，我们提出了MagiCapture，一种利用少量主题和风格参考图像整合主题与风格概念以生成高分辨率肖像图像的个性化方法。例如，给定少量随机自拍照，我们微调后的模型可以生成特定风格（如证件照或头像照）的高质量肖像图像。此任务的主要挑战在于缺乏组合概念的基准真相，导致最终输出质量下降和源主体身份偏移。为解决这些问题，我们提出了一种新颖的注意力重聚焦损失函数，并结合辅助先验知识，在弱监督学习框架内促进稳健学习。我们的流程还包括额外的后处理步骤，以确保生成高度逼真的输出。MagiCapture在定量和定性评估中均优于其他基线方法，并可推广至其他非人类对象。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日