Is This Loss Informative? Speeding Up Textual Inversion with Deterministic Objective Evaluation

Text-to-image generation models represent the next step of evolution in image synthesis, offering natural means of flexible yet fine-grained control over the result. One emerging area of research is the rapid adaptation of large text-to-image models to smaller datasets or new visual concepts. However, the most efficient method of adaptation, called textual inversion, has a known limitation of long training time, which both restricts practical applications and increases the experiment time for research. In this work, we study the training dynamics of textual inversion, aiming to speed it up. We observe that most concepts are learned at early stages and do not improve in quality later, but standard model convergence metrics fail to indicate that. Instead, we propose a simple early stopping criterion that only requires computing the textual inversion loss on the same inputs for all training iterations. Our experiments on both Latent Diffusion and Stable Diffusion models for 93 concepts demonstrate the competitive performance of our method, speeding adaptation up to 15 times with no significant drops in quality.

翻译：文本到图像生成模型代表了图像合成领域的下一步演进，提供了对结果进行灵活且细粒度控制的自然方式。一个新兴的研究方向是将大型文本到图像模型快速适应到较小数据集或新的视觉概念中。然而，最高效的适应方法——即文本反转——存在已知的训练时间过长问题，这不仅限制了实际应用，也增加了研究中的实验时间。在本研究中，我们探讨了文本反转的训练动态，旨在加速这一过程。我们观察到，大多数概念在早期阶段就已学习完成，后续并未显著提升质量，但标准的模型收敛指标无法反映这一点。因此，我们提出一个简单的早停准则，该准则仅需在所有训练迭代中计算相同输入下的文本反转损失。我们在潜在扩散模型和稳定扩散模型上对93个概念进行的实验表明，我们的方法具有竞争性性能，可将适应速度提升至15倍，且质量无明显下降。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/