Recent progress in text-to-image (T2I) models enables high-quality image generation with flexible textual control. To utilize the abundant visual priors in the off-the-shelf T2I models, a series of methods try to invert an image to proper embedding that aligns with the semantic space of the T2I model. However, these image-to-text (I2T) inversion methods typically need multiple source images containing the same concept or struggle with the imbalance between editing flexibility and visual fidelity. In this work, we point out that the critical problem lies in the foreground-background entanglement when learning an intended concept, and propose a simple and effective baseline for single-image I2T inversion, named SingleInsert. SingleInsert adopts a two-stage scheme. In the first stage, we regulate the learned embedding to concentrate on the foreground area without being associated with the irrelevant background. In the second stage, we finetune the T2I model for better visual resemblance and devise a semantic loss to prevent the language drift problem. With the proposed techniques, SingleInsert excels in single concept generation with high visual fidelity while allowing flexible editing. Additionally, SingleInsert can perform single-image novel view synthesis and multiple concepts composition without requiring joint training. To facilitate evaluation, we design an editing prompt list and introduce a metric named Editing Success Rate (ESR) for quantitative assessment of editing flexibility. Our project page is: https://jarrentwu1031.github.io/SingleInsert-web/
翻译:近年来,文生图模型的最新进展使得在灵活文本控制下生成高质量图像成为可能。为利用现成文生图模型中丰富的视觉先验,一系列方法尝试将图像逆转为与模型语义空间对齐的合适嵌入。然而,这些图像到文本的逆置方法通常需要包含同一概念的多张源图像,或在编辑灵活性与视觉保真度之间难以平衡。本文指出,关键问题在于学习目标概念时前景与背景的纠缠,并提出一种简单有效的单图像逆置基线方法——SingleInsert。SingleInsert采用两阶段方案:第一阶段,我们调控所学嵌入聚焦于前景区域,避免与无关背景关联;第二阶段,我们微调文生图模型以提升视觉相似度,并设计语义损失来防止语言漂移问题。通过所提技术,SingleInsert在实现高视觉保真度的单概念生成的同时,还能支持灵活编辑。此外,SingleInsert无需联合训练即可完成单图像新视角合成与多概念组合。为便于评估,我们设计了编辑提示列表,并引入名为编辑成功率(ESR)的指标来定量衡量编辑灵活性。项目页面:https://jarrentwu1031.github.io/SingleInsert-web/