Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and more. Existing personalization methods may compromise personalization ability or the alignment to complex textual prompts. This trade-off can impede the fulfillment of user prompts and subject fidelity. We propose a new approach focusing on personalization methods for a \emph{single} prompt to address this issue. We term our approach prompt-aligned personalization. While this may seem restrictive, our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts, which may pose a challenge for current techniques. In particular, our method keeps the personalized model aligned with a target prompt using an additional score distillation sampling term. We demonstrate the versatility of our method in multi- and single-shot settings and further show that it can compose multiple subjects or use inspiration from reference images, such as artworks. We compare our approach quantitatively and qualitatively with existing baselines and state-of-the-art techniques.
翻译:内容创作者通常希望通过个人主题生成个性化图像,而这类需求超出了传统文本到图像模型的能力范围。此外,他们可能希望生成的图像包含特定地点、风格、氛围等要素。现有个性化方法往往在个性化能力与复杂文本提示的对齐性之间进行权衡,这种权衡可能阻碍用户提示的完成度与主体保真度。我们提出一种针对单一提示的个性化方法来解决这一问题,并将其命名为"提示对齐个性化"。虽然该方法看似具有局限性,但其在提升文本对齐性方面表现卓越,能够生成包含复杂提示元素的图像——这对现有技术而言颇具挑战。具体而言,我们的方法通过引入额外的分数蒸馏采样项,使个性化模型始终保持与目标提示的对齐性。我们展示了该方法在多图像与单图像场景中的通用性,并进一步证明其可组合多个主体或借鉴参考图像(如艺术作品)的灵感。我们通过定量与定性实验,将本方法与现有基线及最新技术进行了比较。