ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called \sysname. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available athttps://github.com/zyxElsa/ProSpect.

翻译：个性化生成模型为用户提供的参考图像引导生成提供了一种方式。当前的个性化方法可以将物体或概念反转至文本条件空间，并为文本到图像扩散模型组合新的自然句子。然而，表示和编辑具体视觉属性（如材质、风格和布局）仍然存在挑战，导致缺乏解耦性与可编辑性。为解决这一问题，我们提出了一种新颖方法，利用扩散模型逐步生成的过程（图像从低频到高频信息生成），为表示、生成和编辑图像提供了新视角。我们开发了提示频谱空间P*（一种扩展的文本条件空间），以及一种名为\sysname的新图像表示方法。ProSpect将图像表示为从阶段提示编码得到的反转文本标记嵌入的集合，其中每个阶段提示对应扩散模型的特定生成阶段（即连续步骤组）。实验结果表明，与现有方法相比，P*和ProSpect提供了更好的解耦性与可控性。我们将ProSpect应用于多种个性化属性感知图像生成任务，例如基于图像引导或文本驱动的材质、风格和布局操作，无需微调扩散模型即可从单个图像输入实现此前无法达到的结果。我们的源代码可在https://github.com/zyxElsa/ProSpect获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日