Key-Locked Rank One Editing for Text-to-Image Personalization

Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that "locks" new concepts' cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime-efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model, which is five orders of magnitude smaller than the current state of the art. Moreover, it can span different operating points across the Pareto front without additional training. Finally, we show that Perfusion outperforms strong baselines in both qualitative and quantitative terms. Importantly, key-locking leads to novel results compared to traditional approaches, allowing to portray personalized object interactions in unprecedented ways, even in one-shot settings.

翻译：文本到图像模型（T2I）通过允许用户通过自然语言引导创作过程，提供了新的灵活性水平。然而，将这些模型个性化以适应用户提供的视觉概念仍然是一个具有挑战性的问题。T2I个性化任务面临多个难题，例如在保持高视觉保真度的同时允许创意控制、在单张图像中组合多个个性化概念，以及维持较小的模型规模。我们提出Perfusion，一种基于动态秩一更新底层T2I模型的个性化方法，以解决这些挑战。Perfusion通过引入一种新机制，将新概念的交叉注意力键“锁定”到其上级类别，从而避免过拟合。此外，我们开发了一种门控秩一方法，能够在推理时控制已学习概念的影响，并组合多个概念。这使得通过单个100KB的已训练模型（比当前最先进方法小五个数量级）在运行时高效平衡视觉保真度和文本对齐性成为可能。同时，它无需额外训练即可在帕累托前沿上跨越不同的操作点。最后，我们证明Perfusion在定性和定量方面均优于强基线方法。重要的是，与传统方法相比，键锁定机制带来了新颖的结果，允许以前所未有的方式描绘个性化对象交互，即使在单样本设置中也是如此。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日