Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Text-to-image (T2I) customization aims to create images that embody specific visual concepts delineated in textual descriptions. However, existing works still face a main challenge, concept overfitting. To tackle this challenge, we first analyze overfitting, categorizing it into concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities, i.e, backgrounds, layouts, styles. To evaluate the overfitting degree, we further introduce two metrics, i.e, Latent Fisher divergence and Wasserstein metric to measure the distribution changes of non-customized and customized concept respectively. Drawing from the analysis, we propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities, while preserving non-customized knowledge. Remarkably, Infusion achieves this feat with remarkable efficiency, requiring a mere 11KB of trained parameters. Extensive experiments also demonstrate that our approach outperforms state-of-the-art methods in both single and multi-concept customized generation.

翻译：文本到图像（T2I）定制化旨在生成体现文本描述中特定视觉概念的图像。然而，现有工作仍面临一个主要挑战：概念过拟合。为解决此问题，我们首先分析过拟合，将其分为概念无关过拟合（破坏非定制化概念知识）和概念特定过拟合（局限于有限模态的定制，如背景、布局、风格）。为评估过拟合程度，我们进一步引入两个指标，即潜在Fisher散度和Wasserstein度量，分别用于衡量非定制化与定制化概念的分布变化。基于分析，我们提出Infusion，一种T2I定制化方法，能够学习目标概念以避免受限于有限训练模态，同时保留非定制化知识。值得注意的是，Infusion以仅需11KB可训练参数的高效性实现上述目标。大量实验也表明，我们的方法在单概念和多概念定制化生成中均优于现有最先进方法。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日