In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt. In this work, we propose Subject-Agnostic Guidance (SAG), a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our proposed dual classifier-free guidance, one could obtain outputs consistent with both the given subject and input text prompts. We validate the efficacy of our approach through both optimization-based and encoder-based methods. Additionally, we demonstrate its applicability in second-order customization methods, where an encoder-based model is fine-tuned with DreamBooth. Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality improvements, as evidenced by our evaluations and user studies.
翻译:在主题驱动的文本到图像合成中,合成过程往往受到用户提供的参考图像的强烈影响,常常忽略文本提示中所描述的关键属性。在本文中,我们提出了一种名为主语无关引导(Subject-Agnostic Guidance, SAG)的简单而有效的解决方案来纠正这一问题。我们表明,通过构建一个主语无关的条件并应用我们提出的双重无分类器引导,可以获得与给定主语和输入文本提示都一致的输出结果。我们通过基于优化和基于编码器的方法验证了我们方法的有效性。此外,我们还展示了其在二阶定制方法中的适用性,其中基于编码器的模型通过DreamBooth进行微调。我们的方法在概念上简单,仅需极少的代码修改,但如我们的评估和用户研究所证明的那样,能带来显著的质量提升。